Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bug: Inconsistent I/O behavior in BigQuery backend regarding dataset specification #10547

Open
1 task done
everdark opened this issue Nov 30, 2024 · 1 comment
Open
1 task done
Labels
bigquery The BigQuery backend bug Incorrect behavior inside of ibis

Comments

@everdark
Copy link

everdark commented Nov 30, 2024

What happened?

Ibis has an inconsistent behavior when it comes to reading and writing tables to BigQuery dataset.

When reading a table, ibis does not require a connection to specify the dataset_id in ibis.bigquery.connect. We can then specify the dataset with either a namespaced table name such as dataset_name.table_name or use the database argument in reading a table. And ibis will raise if we specify both. It will also raise when no dataset_id is specified in connection and no namespace or database are provided in reading table.

For example, the following code will raise IbisInputError: Cannot specify database both in the table name and as an argument:

conn = ibis.bigquery.connect("project", location="region")
conn.table("test1.test", data, database="test2")

and the followings are fine, both can read the table test from dataset test1:

conn = ibis.bigquery.connect("project", location="region")
conn.table("test1.test", data)
conn.table("test", data, database="test1")

So far so good, however, things are very different as we are writing a table.

When writing a table, namespaced table name does NOT work, which means a dataset must be specified as either connection argument (dataset_id) or a saving argument (database). And the later overwrite the former. A surprising behavior is that, when BOTH a namespaced table and a database argument are specified, the dataset in namespace overwrites the argument.

For example, the following will raise ValueError: Unable to determine BigQuery dataset.:

conn = ibis.bigquery.connect("project", location="region")
conn.create_table("test1.test", data)

and this will (surprisingly) save the table test to test1.

conn = ibis.bigquery.connect("project", location="region")
conn.create_table("test1.test", data, database="test2")

which is rather confusing.

The expected behavior (for consistency) should be that we can either save the table using namespaced table name without dataset argument, or we can save it without namespaced table name but with a dataset argument, and it should raise when both are specified.

What version of ibis are you using?

9.5.0

What backend(s) are you using, if any?

BigQuery

Relevant log output

No response

Code of Conduct

  • I agree to follow this project's Code of Conduct
@everdark everdark added the bug Incorrect behavior inside of ibis label Nov 30, 2024
@gforsyth gforsyth added the bigquery The BigQuery backend label Nov 30, 2024
@gforsyth
Copy link
Member

Thanks for reporting this, @everdark !

BigQuery is the only backend (I think) that supports dotted path locations as the table name, which dates back to when it was independently maintained as a third-party backend.

We should definitely resolve the idiosyncrasies here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bigquery The BigQuery backend bug Incorrect behavior inside of ibis
Projects
Status: backlog
Development

No branches or pull requests

2 participants