AI Built Into the Data Layer — Unlock Insights from Unstructured Text
Every enterprise sits on a goldmine of unstructured text — customer feedback, support tickets,
survey responses, CRM notes, compliance reports — yet most analytics platforms treat these columns
as opaque strings, invisible to dashboards and KPIs. Oracle Analytics Cloud changes that with
three AI functions built directly into the data layer: AI_GENERATE(), AI_AGG(),
and AI_FILTER().
These aren't bolt-on chatbots or separate AI tools that require data exports. They live inside OAC's
Workbook expression editor — right alongside SUM(), RANK(),
and FORECAST(). Write a plain English prompt as a quoted string argument, point it at
any text column, and the LLM classifies, summarizes, or filters your data in real time. No Python,
no ML pipeline, no data movement. The result is a new column in your workbook — structured,
queryable, ready for dashboards.
The natural language instruction is the 'prompt_string' argument — a quoted string
you type directly inside the function in OAC's Workbook expression editor.
There is no separate chatbot box or AI panel. You open Add Calculation in your workbook,
type the function, and the prompt is simply the '...' string you pass in as an argument:
COMMENTS, ← the text column
'Classify this feedback as Positive, Negative, Mixed, or Neutral.', ← your plain English prompt
OPENAI ← the registered model name
)
Change the prompt string, and the entire column's output changes — no retraining, no code, no pipeline. That's the core power of these functions.
In this guide, I'll walk you through everything you need to get this working: from registering
an AI model in OAC, setting up Select AI in Autonomous Database, all the way to running
AI_GENERATE(), AI_AGG(), and AI_FILTER() against real
customer data from Oracle's SH schema — which already exists in my Autonomous Database and required no additional setup. Every code snippet is copy-paste ready and has been validated live in OAC's expression editor (March 2026 release) against the SH sample schema that ships pre-loaded with Autonomous Database — including the gotchas where the current documentation is inconsistent with what actually works. I've also included the troubleshooting section I wish I'd had the first time I set this up.
Prerequisites & Architecture Overview
Before diving into code, it helps to understand how all the pieces connect. OAC's Generative AI
functions don't call LLMs directly from the analytics layer — they route calls through a
registered model connection. When you write AI_GENERATE(COMMENTS, 'your prompt', OPENAI),
the name OPENAI is just the label you gave that registered connection. The function itself
doesn't know or care which provider is behind it — that's determined entirely by how the connection was set up.
There are two ways to register a model connection in OAC:
- OCI Generative AI Service — registered directly in OAC via OAC Console → Machine Learning → AI Services / AI Connections. Uses Oracle's hosted models (Llama 3, Cohere Command, etc.) with no external network calls. Best if you're already running OAC on OCI and want to stay within Oracle's infrastructure.
-
Autonomous Database / Autonomous AI Lakehouse via Select AI — you configure an
AI profile in Autonomous Database using
DBMS_CLOUD_AI.CREATE_PROFILE(), then expose that database as a data source in OAC. The OAC model name is defined when you create the connection — it references the profile name used in the database-side calls. This path supports providers such as OpenAI, Anthropic, Google, Cohere, Azure OpenAI, and others — any LLM that ADW'sDBMS_CLOUD_AIlayer supports (availability varies by region).📌 Naming noteAutonomous AI Lakehouse is the next-generation evolution and strategic rebranding of Autonomous Data Warehouse (ADW) — not a separate product. From an OCI provisioning perspective, what was previously the ADW workload type is now offered as the Autonomous AI Lakehouse workload, built on the same Autonomous Database platform. If you already know ADW, think of Autonomous AI Lakehouse as ADW plus expanded lakehouse and AI capabilities, with Oracle's strategic positioning shifted toward "AI data platform" rather than pure data warehousing. The
DBMS_CLOUD_AIsetup steps are identical regardless of which workload type you provisioned.
Whichever path you use to register the model, the expression syntax in OAC's workbook is
identical — AI_GENERATE(column, 'prompt', YOUR_MODEL_NAME). The model name in
the expression is just a reference to the registered connection. §03 below walks through
both registration paths step by step.
Both paths produce the same result from OAC's perspective: a registered model you can reference in expression editors across Workbooks, Datasets, Data Flows, and Semantic Models.
- An Oracle Cloud Infrastructure (OCI) account with OAC provisioned
- ADMIN access to the OAC instance
- For OCI path: Access to OCI Generative AI Service in your compartment
- For ADW path: An Autonomous AI Lakehouse (built on Autonomous Database) with ADMIN access + a paid API account with your chosen LLM provider (OpenAI, Anthropic, etc.)
All examples in this guide use the SH (Sales History) schema, specifically the
SUPPLEMENTARY_DEMOGRAPHICS table, which contains a free-text COMMENTS
column — ideal for testing AI functions. The SH schema ships pre-loaded with Autonomous
Database; it was already present in my database and required no additional dataset setup for
this demo.
Step 0 — Register a Generative AI Model in OAC
This is the gate that everything else depends on. As of the March 2026 release, none of the AI functions
(AI_GENERATE, AI_AGG, AI_FILTER) will appear in your
expression editor until at least one model is registered. Here's how.
Navigate to your OAC Homepage → Generative AI Models (top navigation or hamburger menu depending on your version). Click Register Model. You'll be prompted to choose between two sources:
Path A — OCI Resource Connection (Recommended for OCI-native deployments)
This path stays entirely within the OCI network — no external API calls, no network ACL configuration required. It's the simplest option if you're running OAC on OCI.
Before OAC can call OCI Generative AI, your group must have the appropriate IAM policy. The minimum required policy to use hosted models is:
allow group <your-group-name> to use generative-ai-family in compartment <your-compartment-name>
The generative-ai-family aggregate resource-type covers chat, text generation,
text summarization, text embedding, models, imported models, dedicated AI clusters, endpoints,
private endpoints, and work requests. Access verbs are cumulative:
inspect → read → use → manage.
For just calling hosted inference (what OAC needs), use is sufficient.
manage is required only if you also need to create or delete custom models,
dedicated clusters, or endpoints.
Official reference: Getting Access to Generative AI — OCI IAM Policies
- In the Register Model dialog, select OCI Resource Connection
- Choose the OCI Connection configured for your tenancy
- Select the Compartment where your Generative AI Service is enabled
- Pick a model from the available endpoints (e.g., meta.llama-3-70b-instruct, cohere.command-r-plus)
- Give the registered model a display name and click Save
OCI Generative AI communicates within the Oracle Cloud internal network. You skip the credential management, network ACL configuration, and external API billing that comes with third-party providers. For most enterprise OAC deployments, this is the recommended starting point.
Path B — Autonomous AI Lakehouse (Select AI)
If you want to use OpenAI's GPT-4o, Anthropic's Claude, or another provider — or if your
organization already has an Autonomous Database with Select AI configured — this path gives you
the most provider flexibility. OAC delegates all LLM calls through the ADW
DBMS_CLOUD_AI layer.
- In the Register Model dialog, select Autonomous AI Lakehouse
- Choose the Connection pointing to your ADW instance
- Select the AI Profile already configured in that ADW — this is the profile name from
DBMS_CLOUD_AI.CREATE_PROFILE(), which you'll set up in the next section - Give the registered model a display name and click Save
The AI Profile must already exist in the ADW before you can register it in OAC. Complete Step 1 (Select AI setup) first if you're using the ADW path.
Step 1 — Configure Select AI in Autonomous Database
This section follows the official Oracle documentation: Manage AI Profiles — Oracle Autonomous Database Serverless . Refer to that page for the full provider setup guides, profile attribute reference, and RAG-specific privilege requirements.
Skip this section if you're using Path A (OCI Resource Connection) exclusively. If you're routing through Autonomous Database, here's the complete setup — run as ADMIN unless noted.
DWROLE bundles the privileges needed for DBMS_CLOUD and DBMS_CLOUD_PIPELINE.showsql or narrate call, then return to OAC to register it.Step 1: Create the Database User
-- Create a dedicated user CREATE USER ADB_USER IDENTIFIED BY "<YourStrongPassword>"; GRANT DWROLE TO ADB_USER; GRANT CONNECT TO ADB_USER; -- If querying SH schema tables, grant SELECT from the table owner ALTER USER SH IDENTIFIED BY "<strong_password>"; -- Connect as SH, then grant: GRANT SELECT ON SH.CUSTOMERS TO ADB_USER; GRANT SELECT ON SH.SUPPLEMENTARY_DEMOGRAPHICS TO ADB_USER; GRANT SELECT ON SH.SALES TO ADB_USER; GRANT SELECT ON SH.PRODUCTS TO ADB_USER; GRANT SELECT ON SH.PROMOTIONS TO ADB_USER;
Passwords must be at least 12 characters with at least one uppercase letter, one lowercase letter, and one number. If you get ORA-28003, your password doesn't meet complexity requirements.
Step 2: Grant EXECUTE on DBMS_CLOUD_AI
GRANT EXECUTE ON DBMS_CLOUD_AI TO ADB_USER; -- Also grant DBMS_CLOUD if not already done via DWROLE GRANT EXECUTE ON DBMS_CLOUD TO ADB_USER;
Step 3: Grant Network ACL Access
BEGIN DBMS_NETWORK_ACL_ADMIN.APPEND_HOST_ACE( host => 'api.openai.com', -- change per provider ace => xs$ace_type( privilege_list => xs$name_list('http'), principal_name => 'ADB_USER', principal_type => xs_acl.ptype_db ) ); END; /
Use the correct host for your provider. See the provider reference table below for a complete list.
Step 4: Create the Credential
BEGIN DBMS_CLOUD.CREATE_CREDENTIAL( credential_name => 'OPENAI_CRED', username => OPENAI, -- user-defined label, not an account password => 'sk-...' -- your actual API key ); END; /
Credentials belong to the schema that creates them. If you create OPENAI_CRED as ADMIN and then try to use it as ADB_USER, you'll get ORA-20004: Credential does not exist. Always create credentials as the user who will run the queries.
Step 5: Create the AI Profile
The AI profile is the configuration object that ties everything together: which provider to call, which credential to authenticate with, which model to use, and — critically — which tables the LLM should reference when translating natural language into SQL.
BEGIN DBMS_CLOUD_AI.CREATE_PROFILE( profile_name => OPENAI, attributes => '{ "provider": "openai", "credential_name": "OPENAI_CRED", "model": "gpt-4o", "object_list": [ {"owner": "SH", "name": "customers"}, {"owner": "SH", "name": "supplementary_demographics"}, {"owner": "SH", "name": "sales"}, {"owner": "SH", "name": "products"}, {"owner": "SH", "name": "promotions"}, {"owner": "SH", "name": "countries"} ] }' ); END; /
The object_list defines the tables the LLM will receive schema metadata for when translating natural language. Include only the tables relevant to your use case — a smaller, focused list produces more accurate SQL than a massive one. Replace SH with your actual schema owner if using your own tables.
The 5 Select AI Actions
Once the profile is set up, you can call it with five different actions:
| Action | What It Does | Returns |
|---|---|---|
showsql |
Generates SQL from your natural language prompt — does not execute it | SQL string |
runsql |
Generates and executes the SQL, returning result rows | Tabular data |
narrate |
Returns results as a natural language narrative explanation | Text |
explainsql |
Generates SQL and explains each clause in plain English | SQL + explanation |
chat |
General-purpose LLM conversation — no SQL or database data involved | Text |
Step 6: Test Your Setup
-- 1. See the generated SQL before running it SELECT DBMS_CLOUD_AI.GENERATE( prompt => 'how many customers exist', profile_name => OPENAI, action => 'showsql' ) FROM DUAL; -- Expected: SELECT COUNT("CUST_ID") AS "Total_Customers" FROM "SH"."CUSTOMERS" -- 2. Run and get a narrative answer SELECT DBMS_CLOUD_AI.GENERATE( prompt => 'what are the top 3 customers in San Francisco', profile_name => OPENAI, action => 'narrate' ) FROM DUAL; -- 3. Shorthand syntax (SQL*Plus / SQLcl only, NOT in Database Actions Worksheet) EXEC DBMS_CLOUD_AI.SET_PROFILE(OPENAI); SELECT AI narrate what are the top 3 customers in San Francisco;
The SELECT AI shorthand syntax only works in SQL*Plus and SQLcl. In Database Actions SQL Worksheet, use DBMS_CLOUD_AI.GENERATE() instead — behavior can vary by tool and version, but the shorthand may produce ORA-00923: FROM keyword not found in some environments.
Supported AI Providers
The full per-provider setup instructions (API key locations, credential formats, and ACL hosts)
are documented in the official
Manage AI Profiles
page. The ACL host values for each provider are:
api.openai.com (OpenAI), api.cohere.ai (Cohere),
api.anthropic.com (Anthropic), generativelanguage.googleapis.com (Google),
api-inference.huggingface.co (Hugging Face),
bedrock-runtime.us-east-1.amazonaws.com (AWS Bedrock).
OCI Generative AI requires no ACL.
The setup pattern is conceptually identical for all providers — create a credential, grant network ACL access (if needed), create the profile with the provider name. Only the host, credential format, and provider string change. Supported providers include the following; exact availability may vary by region.
The Three LLM AI Functions at a Glance
With a model registered, three new functions become available in OAC's expression editors. They look like any other analytics function — but instead of computing a number, they call an LLM. Here's how they differ:
| Property | AI_GENERATE() | AI_AGG() | AI_FILTER() |
|---|---|---|---|
| Processes | One row at a time | Multiple rows per group | One row at a time |
| LLM calls | 1 per row | 1 per group | 1 per row |
| Returns | TEXT per row | TEXT per group | TRUE / FALSE per row |
| Requires group attribute | ✗ No | ✓ Yes (required) | ✗ No |
| Primary use cases | Sentiment labeling, classification, tagging, translation, enrichment | Group summaries, segment sentiment rollup, theme extraction | Intent detection, churn risk, relevance scoring, complaint routing |
| Available in Workbooks | ✓ | ✓ | ✓ |
| Available in Datasets | ✓ | ✓ | ✓ |
| Available in Data Flows | ✓ | ✗ | ✓ |
| Available in Semantic Models | ✓ | ✗ | ✓ |
AI_GENERATE() — Row-Level Text Enrichment
How It Works
AI_GENERATE() processes your dataset one row at a time. For each row, it
constructs a prompt by concatenating your template string with values from that row's columns,
sends that prompt to the registered LLM, and stores the response as a new column value.
The next row gets its own independent LLM call — there is no cross-row context.
Think of it as a CASE WHEN that uses natural language instead of boolean
conditions — and an AI instead of hard-coded rules.
Syntax
AI_GENERATE( input_column, -- bare column name (no quotes, no table qualifier) 'prompt_string', -- instructions to the LLM model_name -- registered model name (no quotes) ) -- ✓ Confirmed working — classify customer comments with a sentiment label: -- Name field (top of dialog): Sentiment CASE WHEN COMMENTS IS NULL OR LENGTH(TRIM(COMMENTS)) = 0 THEN 'No Comment' ELSE AI_GENERATE( COMMENTS, 'Classify this feedback as Positive, Negative, Mixed, or Neutral. Return only the label.', OPENAI ) END -- ⚠ Common mistakes that throw nQSError: 26012: -- ✗ AI_GENERATE("COMMENTS", ...) — quoted column name (treated as literal string) -- ✗ AI_GENERATE("TABLE"."COMMENTS", ...) — table-qualified column name -- ✗ AI_GENERATE(COMMENTS, ..., OPENAI) AS SENTIMENT — AS alias in expression body -- ✗ AI_GENERATE(COMMENTS, ..., 'OPENAI') — quoted model name -- Use the Name field at the top of the dialog for your column label.
Full Example — Logical SQL
-- ✓ Basic: bare column names, model name unquoted, no alias AI_GENERATE( COMMENTS, 'Classify this customer feedback as Positive, Negative, Mixed, or Neutral. Return only the label.', OPENAI -- unquoted model name ) -- ✓ Recommended: wrap with NULL guard — prevents hallucination on empty rows CASE WHEN COMMENTS IS NULL OR TRIM(COMMENTS) = '' THEN 'No Comment' ELSE AI_GENERATE( COMMENTS, 'Classify this customer feedback as Positive, Negative, Mixed, or Neutral. Return only the label.', OPENAI ) END -- ⚠ Do NOT include AS SENTIMENT — the Name field is your column label.
Real Data Example
The following shows AI_GENERATE() applied to real customer comments from the
SH schema's SUPPLEMENTARY_DEMOGRAPHICS table. Each row receives its own
independent LLM call — both the SENTIMENT and TOPIC_TAG columns are AI-generated outputs.
| CUST_ID | OCC. | COMMENTS | AI → SENTIMENT | AI → TOPIC |
|---|---|---|---|---|
| 100001 | Exec. | "Thanks for my affinity card. I love the discounts and now shop here for everything." | Positive | Loyalty Program |
| 100002 | Prof. | "The more times I shop, the more times I am impressed. Don't change anything." | Positive | Overall Experience |
| 100003 | Sales | "Good way to attract shoppers. After a month, I am ready to move on. Not enough variety." | Mixed | Product Variety |
| 100004 | Sales | "Thanks but even with discounts your products are too expensive. Sorry." | Negative | Pricing Complaint |
| 100005 | Crafts | "Affinity card is a great idea. Still too expensive. Tired of junk mail." | Mixed | Pricing + Spam |
| 100006 | Prof. | "Not going to waste time filling this three-page form. Lousy idea." | Negative | UX / Form Design |
| 100008 | Crafts | "I shop here a lot. I love your weekly specials." | Positive | Promotions |
| 100009 | Prof. | "Thank you! But very unhappy with all the junk mail you keep sending." | Mixed | Marketing / Spam |
| 100010 | Crafts | "Unhappy with service. Do not consider me loyal just because I have a card." | Negative | Customer Service |
| 100011 | Farming | "I love the new on-line documentation." | Positive | Digital / Website |
↑ Blue-italic columns are AI_GENERATE() outputs — one LLM call per row, zero hand-coded rules
If a row has a NULL or empty COMMENTS value, AI_GENERATE() still makes an LLM call — but with no actual content to classify. Most models default to Positive (the most benign label) when given a blank prompt. Wrap every AI_GENERATE() that reads from a nullable text column in a CASE WHEN ... IS NULL guard:
Best Prompt Practices for AI_GENERATE()
- Be explicit about format. "Return only the label" or "Respond in JSON" dramatically reduces hallucination and parsing headaches.
- Include few-shot examples. If you need consistent output, add 2–3 examples directly in your prompt template before the actual row data.
- Keep prompts focused. Each AI_GENERATE() call gets its own LLM call — if you need two outputs (sentiment + topic), either make two AI_GENERATE() columns or have the LLM return both in a delimited string you parse later.
- Watch token costs. If your COMMENTS column contains very long text, consider truncating with
SUBSTR()before passing to AI_GENERATE().
AI_AGG() — Group-Level Summarization
How It Works
AI_AGG() is fundamentally different from AI_GENERATE() in one key
way: instead of running a separate LLM call for each row, it collects all text values
within a group and sends them together in a single prompt. The result is one synthesized
response per group — not per row.
Think of it as GROUP BY for language. Just as SUM(revenue) rolls up
individual sales figures into a total per category, AI_AGG() rolls up individual
customer comments into a theme summary per segment.
AI_AGG() groups by whatever dimension is present in your table visualization — it does not take a group attribute inside the expression itself. Drop only the grouping dimension (e.g. OCCUPATION) and the AI_AGG calc into your table. Do not add the COMMENTS column to the same table — this forces row-level output because every unique COMMENTS value becomes its own group, defeating the purpose entirely. This is also why AI_AGG() is only available in Datasets and Workbooks, not Semantic Models or Data Flows.
Syntax
AI_AGG( input_column, -- the text column to aggregate across the group 'task_description', -- instructions for what to do with the grouped text model_string -- registered model name (no quotes needed) [, 'optional_param_json'] -- optional: {"mode":"EXTERNAL","parameter":{"temperature":0.7,"max_token":205}} ) -- Note: AI_AGG requires a group attribute to be present in your visualization -- or query context (e.g., drop OCCUPATION into your table alongside this measure)
Full Example — Logical SQL
-- ✓ Workbook / Dataset expression editor — bare column names, no alias AI_AGG( COMMENTS, 'Summarize the key themes and overall sentiment from these customer comments. Identify the top 2 recurring issues. Keep your response to 2–3 sentences.', OPENAI ) -- ⚠ Critical: Only put OCCUPATION + this calc in your table visualization. -- Do NOT add COMMENTS to the same table — it breaks grouping. -- Each unique COMMENTS value becomes its own group → row-level output. -- No AS alias — Name field at top of dialog is your column label.
Real Data Example
Using the same SUPPLEMENTARY_DEMOGRAPHICS.COMMENTS data — but now instead of
classifying each row individually, we're synthesizing entire segments into one summary per group.
Each row in the result below represents many input rows collapsed into a single LLM call.
| OCCUPATION | AI_AGG() → OCCUPATION INSIGHT (one LLM call per group — real output) |
|---|---|
| ? | Key themes include store affordability, location convenience, product variety, privacy concerns, and satisfaction with affinity card discounts. Customers appreciate the concept of the affinity card but have mixed feelings about the store's pricing, location, product variety, and privacy policies. |
| Cleric. | Key themes include appreciation for the affinity card and discounts, frustration with remembering to bring the card while shopping, requests for discounts on past purchases, and concerns about the store's prices and variety. Sentiment is mixed — some customers expressing satisfaction and loyalty, while others are dissatisfied and considering shopping elsewhere. |
| Exec. | Key themes include dissatisfaction with pricing, service, and promotional materials, as well as appreciation for discounts and loyalty programs. Overall sentiment is mixed — some customers expressing frustration, others showing loyalty and satisfaction with the store's offerings. |
| Handler | Key themes are pricing and the comparison between the affinity card and competitors. Overall sentiment is that the competitor offers better prices without the affinity card, leading to potential dissatisfaction with the current card's pricing. |
| House-s | Key themes are the desire for variety and the importance of providing complete products. Sentiment is a mix of appreciation for the store's efforts to attract new shoppers, but also disappointment and frustration with the lack of variety and missing manuals. |
| Machine | Customers generally appreciate the Affinity Card program and the discounts it offers, with some expressing frustration over the store's prices and junk mail. The convenience of the card is appreciated, but some find it a hassle to remember to bring it when they shop. |
| Other | Key themes include inquiries about bulk purchase discounts, dissatisfaction with service, mixed feelings about the Affinity Card program, and appreciation for discounts. Sentiment is a mix of frustration with certain aspects of the store's operations but also appreciation for the discounts offered. |
| Prof. | Key themes include appreciation for the affinity card, frustration with remembering to bring it, concerns about privacy and junk mail, and dissatisfaction with prices and product variety. Sentiment is mixed — gratitude for the discounts, but unhappiness with various aspects of the shopping experience. |
| Sales | Key themes include frustration over missed discounts, convenience of the Affinity Card, privacy concerns, dissatisfaction with promotional mail, and requests for missing manuals and upgrades. Sentiment is mixed — gratitude for the card and positive experiences from some, unhappiness about privacy and missing items from others. |
| TechSup | Key themes include requests for removal from mailing lists, inquiries about discounts and shipping options, appreciation for discounts and affinity cards, and satisfaction with products purchased. Sentiment is a mix of gratitude, interest in discounts, and positive product feedback. |
| Transp. | Key theme is frustration over not receiving a discount on a recent purchase. Overall sentiment is disappointment and a sense of missed opportunity for savings. |
↑ All 11 OCCUPATION groups — actual output from OAC AI_AGG() against SH schema SUPPLEMENTARY_DEMOGRAPHICS. One LLM call per group.
When to Choose AI_AGG() Over AI_GENERATE()
- Use AI_AGG() when you want a single synthesized insight for a group — customer segment summaries, regional feedback themes, product category sentiment
- Use AI_GENERATE() when you need individual output for each record — classifying each ticket, tagging each review, translating each description
- Combine them when you want both: first use AI_GENERATE() to label each row, then use AI_AGG() to summarize labeled groups
AI_FILTER() — Natural Language Boolean Filtering
How It Works
AI_FILTER() evaluates a yes/no question for every row and returns TRUE
or FALSE. You can then use this boolean to filter a workbook visualization,
drive a conditional formatting rule, power an alert, or route records to a downstream process —
all without writing a single regex, keyword list, or custom ML model.
The power of AI_FILTER() is that it understands intent, not just keywords.
A customer saying "I'm ready to move on" and another saying "I won't be coming back" would both
return TRUE for the question "Is this customer at risk of churning?" — even though
they use completely different words.
Syntax
Rule 1 — Argument order: AI_FILTER() puts the predicate string first, before the input column — the opposite of AI_GENERATE() and AI_AGG(). This is the most common mistake when switching between functions.
Rule 2 — Boolean return, no comparison needed: AI_FILTER() returns a native boolean, not a string 'TRUE'/'FALSE'. Write WHEN AI_FILTER(...) THEN — not WHEN AI_FILTER(...) = 'TRUE' THEN. The = 'TRUE' comparison will throw nQSError: 26012 (Near <=>: Syntax error). This is confirmed from live testing against OAC's expression editor.
AI_FILTER( 'predicate_prompt', -- ⚠ predicate comes FIRST: the natural-language condition to test input_string, -- the column value to classify (per row) model_string -- registered model name (no quotes needed) [, 'optional_param_json'] -- optional: {"mode":"EXTERNAL","parameter":{"temperature":0.7,"max_token":205}} ) -- Returns TRUE when the model answers yes to the predicate question -- Returns FALSE for anything else (no, uncertain, null) -- Wrap in CASE WHEN to map TRUE/FALSE to human-readable labels: -- ✓ Confirmed working: AI_FILTER returns a native boolean — no = 'TRUE' comparison needed CASE WHEN AI_FILTER('Is this review positive?', text_column, OPENAI) THEN 'Positive' ELSE 'Needs Review' END
Full Example — Logical SQL
-- ✓ Calc 1: Churn Risk Flag — confirmed working syntax (live-tested) -- NULL guard: LENGTH(TRIM()) = 0 catches true NULLs AND whitespace-only strings (' ') -- TRIM(COMMENTS) = '' also passes validation but misses single-space values → use LENGTH form CASE WHEN COMMENTS IS NULL OR LENGTH(TRIM(COMMENTS)) = 0 THEN 'No Comment' WHEN AI_FILTER( 'Does this comment suggest the customer may stop purchasing or is at risk of leaving?', COMMENTS, OPENAI ) THEN 'At Risk' ELSE 'Retained' END -- ✓ Calc 2: Price Complaint — separate calculated column CASE WHEN COMMENTS IS NULL OR LENGTH(TRIM(COMMENTS)) = 0 THEN 'No Comment' WHEN AI_FILTER( 'Is the customer specifically complaining about price or cost being too high?', COMMENTS, OPENAI ) THEN 'Price Complaint' ELSE 'OK' END -- ✓ Calc 3: Privacy Concern — separate calculated column CASE WHEN COMMENTS IS NULL OR LENGTH(TRIM(COMMENTS)) = 0 THEN 'No Comment' WHEN AI_FILTER( 'Is the customer expressing concern about privacy or not wanting to share personal information?', COMMENTS, OPENAI ) THEN 'Privacy Concern' ELSE 'OK' END -- ⚠ Key rules confirmed from live testing: -- • AI_FILTER returns a native boolean — use WHEN AI_FILTER(...) THEN, NOT = 'TRUE' -- • Model name is unquoted: OPENAI, not 'OPENAI' -- • No AS alias in expression body — use the Name field at top of the dialog -- • NULL guard (LENGTH(TRIM()) = 0) prevents hallucination on empty rows
Real Data Results — Confirmed Live Output
| CUST_ID | OCCUPATION | COMMENTS | Churn Risk (AI_FILTER) |
|---|---|---|---|
| 100782 | Prof. | Affinity card is a great idea. But your store is still too expensive. I am tired of your lousy junk mail. | Retained |
| 100792 | TechSup | Does this discount work if you live in Australia? I've moved and would like to purchase a few items as gifts but want the discount. Have you ever considered offering free shipping? | Retained |
| 100804 | Prof. | Dear store manager, please do not send me any more Affinity cards. I do not shop at your store very often and I feel that your new card imposes an invasion on my privacy. | Retained |
| 100807 | Prof. | (empty) | No Comment |
| 100813 | Exec. | I am unhappy with the service at your store. Do not consider me a loyal customer just because I use your Affinity Card. | Retained |
| 100826 | Sales | I love shopping with my Affinity Card! Thank you. I used it to purchase a new flat panel monitor. It is amazing. Thank you again! | Retained |
| 100851 | TechSup | Thank you, But please remove my name from your list. | At Risk |
| 100856 | Other | I purchased the new mouse pads and love them. I also purchased one for my sister and one for my brother. | Retained |
| 100864 | Other | I just purchased a computer from your store last month. Why didn't you offer this discount then? | Retained |
| 100893 | Other | I used to shop at your store, but have stopped now. I tried to return some electronic items and your store manager was a pain to deal with. | At Risk |
| 100909 | Cleric. | Thanks a lot for my new affinity card. I love the discounts and have since started shopping at your store for everything. | Retained |
| 100915 | ? | It is a good way to attract new shoppers. After shopping at your store for more than a month, I am ready to move on though. Not enough variety. | Retained |
| 100946 | Prof. | Thank you, But please remove my name from your list. | At Risk |
| 100966 | Cleric. | How much would it cost to upgrade my computer to the latest model you advertised this week? | Retained |
| 100974 | Prof. | It is a good way to attract new shoppers. After shopping at your store for more than a month, I am ready to move on though. Not enough variety. | Retained |
| 100981 | Other | I am unhappy with the service at your store. Do not consider me a loyal customer just because I use your Affinity Card. | Retained |
↑ Real output from OAC AI_FILTER() against SH schema SUPPLEMENTARY_DEMOGRAPHICS. Note: the model correctly identifies "remove my name from your list" and "have stopped now" as At Risk, while nuanced dissatisfaction without explicit departure intent returns Retained.
7 of 10 customers show churn risk signals. 4 of 10 are specifically complaining about price. These two boolean columns now drive everything: a "High Risk" conditional format in a workbook, a CRM action queue filtered to churn=TRUE, or an email trigger — all without a single line of code beyond the expression itself.
Churn Prediction Without ML — Using AI Functions as a Lightweight Alternative
Traditional churn prediction requires significant investment: labelled training data, feature engineering, model selection, hyperparameter tuning, and ongoing retraining as customer behaviour shifts. OAC's AI Functions offer a pragmatic alternative that delivers useful churn signals in minutes, directly inside your existing analytics workbook — as long as your dataset includes a text column such as customer comments, support notes, or feedback responses.
All three AI Functions — AI_GENERATE(), AI_AGG(), and AI_FILTER() —
operate exclusively on text input. You cannot run sentiment analysis on a sales amount,
churn detection on a customer ID, or topic classification on an order date.
The function needs something to read — a text or text-castable column. You can use CAST(numeric_col AS CHAR) to convert non-text fields when needed.
Even the product marketing example in OAC's own docs uses a text column —
PROD_NAME contains values like "CD-ROM Software - Artist Series",
not a numeric product ID. The LLM needs words to reason over.
Common text columns that work well:
| Text Column Type | Example Source | Good For |
|---|---|---|
| Customer comments / feedback | CRM, survey tool, loyalty program | Sentiment, churn risk, topic |
| Support ticket descriptions | Zendesk, ServiceNow, JIRA | Urgency, category, resolution flag |
| Product names / descriptions | ERP, product catalog | Marketing copy, category inference |
| Employee performance notes | HCM, review system | Sentiment, theme extraction |
| Sales call / email notes | Salesforce, HubSpot | Deal risk, next action, sentiment |
| Returns / complaint reason codes | Order management system | Root cause classification |
If your fact tables have no text column: join to a related table that does — a CRM notes table, a support ticket feed, or a survey response dataset. Even a short free-text field like a reason code or a description column is enough to unlock all three functions.
A churn prediction model ultimately answers one question per customer: "Is this person likely to leave?"
That is exactly what AI_FILTER() does — it evaluates a natural-language question row by row
and returns a boolean. No training data. No Python. No scheduled retraining jobs.
The LLM's reasoning acts as your classifier.
AI_FILTER() as a Churn Classifier
The Churn Risk column we built earlier is a working churn classifier.
The predicate "Does this comment suggest the customer may stop purchasing or is at risk of leaving?"
is semantically equivalent to the binary target variable a supervised ML model would learn to predict.
The difference: the LLM has already learned that language from billions of examples — you just have to ask.
-- Churn prediction in a single calculated column -- Equivalent to a binary ML classifier, but with zero training data CASE WHEN COMMENTS IS NULL OR LENGTH(TRIM(COMMENTS)) = 0 THEN 'No Signal' WHEN AI_FILTER( 'Does this comment suggest the customer may stop purchasing or is at risk of leaving?', COMMENTS, OPENAI ) THEN 'At Risk' ELSE 'Retained' END
AI_AGG() for Segment-Level Churn Intelligence
Where AI_FILTER() flags individual at-risk customers, AI_AGG() answers the
follow-up question: why are customers in a particular segment at risk? Group by occupation,
region, product category, or any dimension, and you get a natural-language churn narrative per segment —
the kind of insight that would normally require a data scientist to manually analyse cluster outputs.
-- Why are customers in each occupation group at risk? -- Add OCCUPATION to the table viz, AI_AGG gives one summary per group AI_AGG( COMMENTS, 'Summarize the main reasons these customers appear to be at risk of churning. Be specific about themes like pricing, service, privacy, or product gaps.', OPENAI )
How This Compares to Traditional ML Churn Models
| Dimension | Traditional ML (e.g. Logistic Regression / XGBoost) | OAC AI Functions Approach |
|---|---|---|
| Data required | Labelled historical churn data (hundreds to thousands of examples) | Any text column — comments, tickets, notes |
| Setup time | Days to weeks (feature engineering, training, validation) | Minutes — one calculated column in OAC |
| Requires Python / ML skills | Yes | No — natural language prompt only |
| Retraining needed | Yes — when customer behaviour shifts, the model drifts | No — update the prompt, not a model |
| Explainability | Low (black-box) or requires SHAP/LIME post-processing | High — AI_AGG generates a plain-language explanation per segment |
| Works on unstructured text | Requires NLP pipeline (tokenisation, embeddings, etc.) | Yes — natively, no preprocessing |
| Best for | High-volume, high-stakes production churn scoring at scale | Rapid signal discovery, qualitative churn analysis, smaller datasets |
A Three-Function Churn Dashboard
Combine all three functions in a single OAC workbook to build a complete churn intelligence view with no ML infrastructure:
| Visualization | AI Function | What It Tells You |
|---|---|---|
| Customer table with Churn Risk column | AI_FILTER() |
Which individual customers are flagged as At Risk vs Retained |
| Sentiment column alongside Churn Risk | AI_GENERATE() |
Whether At Risk customers are Negative, Mixed, or Neutral — tone context |
| Segment summary table (grouped by region, occupation, etc.) | AI_AGG() |
The specific themes driving churn risk in each segment — actionable narrative |
AI Functions are best for qualitative signal detection on text data. Unlike a trained ML model, the LLM output here is not a calibrated probability score — it's a semantic judgment. If you need to score millions of customers, integrate churn probability into a CRM pipeline, or produce calibrated 0–1 probability scores for downstream decisioning, a trained ML model (Oracle Machine Learning in ADW, OML4Py, or an external model registered in OAC) is the right tool. Think of AI Functions as fast signal discovery — they help you understand what to model before you invest in building the model.
Availability Matrix — Where Can You Use Each Function?
Not all three functions are available everywhere. Here's the complete picture:
All three functions accept an optional 4th argument: a JSON string that controls model behavior. Note that the model name argument is unquoted (e.g., OPENAI not 'OPENAI') — this is consistent with how OAC's expression editor resolves registered model names. The mode field (INTERNAL or EXTERNAL) routes to OCI or external providers. temperature controls randomness (0 = deterministic, 1 = most creative). max_token caps response length. Example:
'{ "mode": "EXTERNAL", "parameter": { "temperature": 0.7, "max_token": 205 } }'
Why is AI_AGG() excluded from Semantic Models and Data Flows? Because AI_AGG()
requires a group attribute in every call — it must know how to partition rows into groups before
aggregating. Semantic Models and Data Flows don't support on-the-fly group specification the way
Datasets and Workbooks do. This mirrors the same constraint as other aggregation functions that
require an explicit GROUP BY context.
Troubleshooting — Common Errors and Fixes
Things don't always work on the first try. Here are the errors I've hit most often and exactly how to fix them.
AS alias at the end of your expression in the calculated column editor. OAC does not allow aliases inside the expression body — the Name field at the top of the dialog is your column label.What you typed:
AI_GENERATE(COMMENTS, 'Classify as Positive, Negative, Mixed or Neutral.', OPENAI) AS SENTIMENT
AS alias entirely:
AI_GENERATE(COMMENTS, 'Classify as Positive, Negative, Mixed or Neutral.', OPENAI)
'TRUE' using = 'TRUE'.
Unlike a string function, AI_FILTER() returns a native boolean — the = operator is unexpected after the closing parenthesis, which the parser reports as Near <=>.What you typed:
CASE WHEN AI_FILTER('Is this at risk?', COMMENTS, OPENAI) = 'TRUE'
THEN 'At Risk' -- ✗ nQSError: 26012 Near <=>
ELSE 'Retained'
END
= 'TRUE' comparison entirely. Use AI_FILTER() directly as the boolean condition:
CASE WHEN AI_FILTER('Is this at risk?', COMMENTS, OPENAI)
THEN 'At Risk' -- ✓ Correct — AI_FILTER() is the condition itself
ELSE 'Retained'
END
COMMENTS is NULL pass an empty string to the LLM. With no content to evaluate, the model defaults to the most neutral-sounding label. In testing with real SH schema data, NULL comments consistently returned Positive.Fix — NULL guard in the expression:
CASE WHEN COMMENTS IS NULL OR TRIM(COMMENTS) = '' THEN 'No Comment' ELSE AI_GENERATE( COMMENTS, 'Classify this feedback as Positive, Negative, Mixed, or Neutral. Return only the label.', OPENAI ) END
COMMENTS column (or any other high-cardinality column) is included in the same table visualization as the AI_AGG calc. Since every row has a unique comment, each row becomes its own group — AI_AGG produces one LLM call per row, not per segment.Fix — remove COMMENTS from the table viz:
OCCUPATION + COMMENTS + Occupation Insight
OCCUPATION + Occupation Insight
Fix:
- Verify your API key is active at your provider's dashboard
- Confirm credits/balance are loaded on the same workspace as the API key (OpenAI separates keys from credits by project)
- Drop and recreate the credential with a fresh key:
BEGIN DBMS_CLOUD.DROP_CREDENTIAL('OPENAI_CRED'); END; / BEGIN DBMS_CLOUD.CREATE_CREDENTIAL( credential_name => 'OPENAI_CRED', username => OPENAI, password => 'sk-your-new-key' ); END; /
Fix: Either connect as the user who created the credential, or recreate the credential under the current user:
-- Verify which user you're connected as SELECT USER FROM DUAL; -- Create the credential under this user BEGIN DBMS_CLOUD.CREATE_CREDENTIAL( credential_name => 'OPENAI_CRED', username => OPENAI, password => 'sk-...' ); END; /
SELECT AI shorthand syntax in the Database Actions SQL Worksheet, which doesn't support it.Fix: Use
DBMS_CLOUD_AI.GENERATE() instead in the Worksheet:
-- ❌ Doesn't work in Database Actions Worksheet: SELECT AI how many customers exist; -- ✓ Use this instead: SELECT DBMS_CLOUD_AI.GENERATE( prompt => 'how many customers exist', profile_name => OPENAI, action => 'runsql' ) FROM DUAL;
Fix (run as ADMIN):
GRANT EXECUTE ON DBMS_CLOUD_AI TO ADB_USER; GRANT EXECUTE ON DBMS_CLOUD TO ADB_USER; -- For granting SH table access when ADMIN doesn't have GRANT OPTION: ALTER USER SH IDENTIFIED BY "<password>"; -- Then connect as SH: GRANT SELECT ON SH.CUSTOMERS TO ADB_USER; -- Last resort for demos only — DO NOT use in production: GRANT PDB_DBA TO ADB_USER;
Fix — verify ACL exists (as ADMIN):
SELECT host, acl FROM DBA_NETWORK_ACLS WHERE host = 'api.openai.com'; -- If nothing returned, re-run the APPEND_HOST_ACE step from §04
Fix: Drop the profile, then recreate with a valid model name — or omit the
"model" attribute entirely to let ADB use the provider default:
EXEC DBMS_CLOUD_AI.DROP_PROFILE(OPENAI); BEGIN DBMS_CLOUD_AI.CREATE_PROFILE( profile_name => OPENAI, attributes => '{ "provider": "openai", "credential_name": "OPENAI_CRED", "model": "gpt-4o", "object_list": [{"owner": "SH", "name": "customers"}] }' ); END; /
Quick Verification Checklist
Run these queries to confirm everything is in place before testing AI functions:
-- 1. Check your execute privileges SELECT table_name AS package_name, privilege FROM USER_TAB_PRIVS WHERE table_name IN ('DBMS_CLOUD_AI', 'DBMS_CLOUD'); -- 2. Check credentials SELECT credential_name, username, enabled FROM USER_CREDENTIALS WHERE credential_name = 'OPENAI_CRED'; -- 3. Check AI profiles SELECT profile_name, status FROM USER_CLOUD_AI_PROFILES; -- 4. Verify active session profile SELECT DBMS_CLOUD_AI.GET_PROFILE() FROM DUAL;
Clean Slate — Remove Everything and Start Over
-- 1. Clear profile from current session BEGIN DBMS_CLOUD_AI.CLEAR_PROFILE; END; / -- 2. Drop the AI profile EXEC DBMS_CLOUD_AI.DROP_PROFILE(OPENAI); -- 3. Drop the credential BEGIN DBMS_CLOUD.DROP_CREDENTIAL('OPENAI_CRED'); END; / -- 4. (Optional) Drop the user entirely — run as ADMIN -- WARNING: CASCADE permanently deletes all user objects DROP USER ADB_USER CASCADE;
Wrapping Up
Oracle Analytics Cloud's Generative AI functions represent a genuine shift in how analytics
platforms integrate AI — not as a bolt-on chatbot or a separate tool you export data to, but
as native expression-editor functions that sit alongside SUM() and
RANK() in your existing workflows.
To recap the key points:
- Model registration comes first — no functions appear until at least one model is registered via OCI Resource Connection or Autonomous AI Lakehouse
- Select AI (ADW path) gives you provider flexibility — OpenAI, Anthropic, Google, Cohere, and more — but requires the credential + network ACL + profile setup covered in §04
- AI_GENERATE() is your row-by-row enrichment function — one LLM call per row, returns text
- AI_AGG() is your group synthesizer — one LLM call per group, requires a group attribute, only available in Datasets and Workbooks
- AI_FILTER() is your intent-aware boolean filter — one LLM call per row, returns TRUE/FALSE, available everywhere
- The troubleshooting section covers the six errors you're most likely to hit, with exact fixes
The real value isn't in the individual functions — it's in what becomes possible when you combine them. Classify every row with AI_GENERATE(), summarize the classified groups with AI_AGG(), filter to at-risk records with AI_FILTER(), then surface all three as interactive visualizations in a single OAC workbook. No Python. No external pipeline. No data movement.
Once you've registered your model in OAC, head to the Hands-On Exercise to build it yourself step by step — sentiment classification, churn risk flagging, group summarization, and a complete dashboard assembly with a reference DVA file.
If you run into anything not covered here, the Oracle documentation on OAC Generative AI and the Select AI reference are the best next stops. And as always — feel free to reach out on LinkedIn or drop a comment below.
The opinions and views shared in this blog are solely my own and do not represent the official stance or policy of Oracle Corporation or any organization I am associated with. This content reflects my personal insights based on hands-on experience and should be interpreted as such.