What It Takes to be the Most Accurate, Secure, and Trusted AI Data Analytics Platform

Image of the ERD diagram

Our mission is to make it easier for everyone, everywhere, to access reliable and trusted data quickly.

In order to achieve that goal, we just shipped a bunch of new features in our latest release. In this article I’m going to go into detail how we are making our platform more accurate, secure, and trusted.

Before diving into the article, I wanted to do a quick shout-out for a podcast I was on recently! If you want to hear me explain in detail my vision for Basejump AI and how it all works, check it out here

I also want to mention our webinar — if you want to learn more about AI Data Agents and what to consider before ‘hiring’ your first one, save your spot here. The webinar is this month on April 30th — come and talk to my co-founder and I about our data analytics platform!

Basejump AI webinar

Latest Release Summary

Our AI Data Agent has access to our documentation, so I’ll just let Basejump explain what’s new in this latest release.

Screenshot from the Basejump AI chat explaining our latest release

We improved AI accuracy, added thumbs up/down on user messages, and added private storage options for data retrieved from the database. I go into more detail below, but here is the TLDR.

TLDR

🎯 Accuracy was improved by enforcing no hallucinated columns or filter values
👍 Users can now provide feedback to the AI using thumbs up/down on a message
🗄️ Users can store saved data results in their own private storage

Avoiding Hallucinated Tables, Columns, and Filters

Proper controls is important for an accurate text-to-SQL system. One of those very necessary controls is to not allow the AI to use tables, columns, or filters that do not exist. This is part of the Basejump AI verification engine, which ensures that the information retrieved is both accurate and trusted.

Hallucinated Tables and Columns

Within Basejump AI, you initially index your database; this will retrieve tables based on the database role that was provided. From the metadata page, you can then choose to ignore tables and columns that you don’t want the AI to be able to see.

Screenshot showing ignore options in Basejump metadata

Once a table or column is ignored, then the AI can no longer see this information. However, despite that fact, in the past our AI has guessed that these tables exist and try to query them anyway. With our latest update, that is no longer possible.

Having a guarantee that tables and columns you don’t want to be accessed cannot be queried is powerful and is the latest improvement to our accuracy.

Hallucinated Filters

Not only do we guarantee the AI won’t hallucinate columns or tables, we also added a guarantee that the AI will never filter for a value that does not exist within your database. And we do this without indexing your data. Let me explain how.

How might a data analyst start querying a table they aren’t familiar with? They likely would look at the first few columns as examples, however, we explicitly do not index this information in our vector store. Instead, our AI data agent has the ability to look at the first few rows of information at runtime if it needs to. However, this ability to ‘sample rows’ is not enough. The AI might still provide a value that does not exist. That’s where our verification engine comes into play. The verification engine parses the SQL query the AI provides and compares it with known values within the database at runtime. A SQL query is not allowed to be executed unless every column in the WHERE clause has valid values.

This avoids any situations where the user gets back 0 rows due to a bad filter. The next item we are adding to our verification engine is guaranteeing joins. If a join is defined in the metadata, the AI should not be able to guess a different join. We will include a different setting so users can enforce that no joins are made that have not been defined between the tables. This includes preserving the primary keys and avoiding duplicates.

How Does Basejump AI’s Data Agent Compare?

One of the largest companies in the ‘AI Data Analyst’ space is Snowflake. Here is an excerpt of the limitations of their AI Copilot:

Screenshot from the Snowflake AI Copilot documentation

The limitations for their copilot are all addressed and not limitations within Basejump AI. You’ll notice that there is no filter verification, no SQL syntax verification, limited tables and columns, and slow recognition of new data sources within the Snowflake screenshot above. In contrast, Basejump has verification of query syntax, verifies filter values, and can handle hundreds of tables and thousands of columns.

This is why Snowflake introduced their Snowflake Cortex Analyst. However, that offering also falls short. Here is a quick comparison of that product compared to Basejump AI:

Comparison to Snowflake Cortex AI

Providing Feedback to the AI Data Agent

Up until this point, users are able to verify results in order to improve the relevancy of information retrieved by our AI Data Agent. However, we’ve now made it even easier with thumbs up/down reactions within the chat.

This is what the AI chat now looks like: Comparison to Snowflake Cortex AI

If a thumbs down response is provided, there is a modal where you can explain why this information may be incorrect.

Thumbs down issue report

As users use this feature more and more, the AI will return more relevant results by caching good responses and removing from cache bad responses and updating the chat history.

Private Storage

One feature we’re very excited about is adding private storage. A large benefit to using Basejump AI is the ability to save data objects and reference them later. These data objects can be shared, compared, or even used to start a new chat with the AI. However, if users don’t want those data objects stored on Basejump servers, they now have the option to store the data objects themselves using AWS S3 object storage.

Screenshot of activating private storage

Basejump AI already doesn’t index information within tables from your database. We now support the option to not store any information from your database as well. You can keep all of that information stored on your own servers. Many alternative solutions do not provide this option so we’re happy to provide it.

Future Developments

That wraps it up for this release! Don’t forget to sign up for the webinar if you haven’t already and come talk to my co-founder and I - we would love to meet you.

Also, if you're going to the Data Council conference, make sure to look for me in our snazzy new T-Shirts:

Basejump AI Data Council T-Shirt

Thanks for reading — plenty more updates to come soon!

Interested in learning more?