Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Data profiling for PinotDB fails for STRING columns. LEN() function should be LENGTH() #19187

Open
piby180 opened this issue Dec 28, 2024 · 0 comments · May be fixed by #19189
Open

Data profiling for PinotDB fails for STRING columns. LEN() function should be LENGTH() #19187

piby180 opened this issue Dec 28, 2024 · 0 comments · May be fixed by #19189

Comments

@piby180
Copy link
Contributor

piby180 commented Dec 28, 2024

Affected module
Does it impact the UI, backend or Ingestion Framework?
Ingestion Framework

Describe the bug
A clear and concise description of what the bug is.

LEN() function is being used in profiler ingestion for string columns. LEN() is not supported in PinotDB and must be replaced with LENGTH()

To Reproduce

Screenshots or steps to reproduce

Deploy a sample table on Pinot with string columns and run profiler ingestion for it in Openmetadata. Here are logs for a sample table airlineStats with string column CancellationCode . The query works after replacing LEN with LENGTH.

airlineStats.CancellationCode metric_type.value: (pinotdb.exceptions.DatabaseError) {'errorCode': 720, 'message': 'QueryPlanningError:\n' 'Error composing query plan for \'/* {"app": "OpenMetadata", ' '"version": "1.6.1.0"} */\n' 'SELECT avg(LEN(CancellationCode)) AS mean, ' 'count(CancellationCode) AS "valuesCount", count(DISTINCT ' 'CancellationCode) AS "distinctCount", MIN(LEN(CancellationCode)) ' 'AS "min", MAX(LEN(CancellationCode)) AS "max", ' 'coalesce(SUM(CAST(CASE WHEN (CancellationCode IS NULL) THEN 1 ' 'ELSE 0 END AS BIGINT)), 0) AS "nullCount", ' 'STDDEV_POP(LEN(CancellationCode)) AS stddev, ' 'SUM(CAST(LEN(CancellationCode) AS BIGINT)) AS "sum" \n' 'FROM "default"."airlineStats"\n' " LIMIT 1': From line 2, column 12 to line 2, column 32: No match " "found for function signature LEN(<CHARACTER>)'\n" 'org.apache.pinot.query.QueryEnvironment.planQuery(QueryEnvironment.java:196)\n' 'org.apache.pinot.broker.requesthandler.MultiStageBrokerRequestHandler.handleRequest(MultiStageBrokerRequestHandler.java:174)\n' 'org.apache.pinot.broker.requesthandler.BaseBrokerRequestHandler.handleRequest(BaseBrokerRequestHandler.java:168)\n' 'org.apache.pinot.broker.requesthandler.BrokerRequestHandlerDelegate.handleRequest(BrokerRequestHandlerDelegate.java:116)\n' 'From line 2, column 12 to line 2, column 32: No match found for ' 'function signature LEN(<CHARACTER>)\n' 'jdk.internal.reflect.GeneratedConstructorAccessor72.newInstance(Unknown ' 'Source)\n' 'java.base/jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)\n' 'java.base/java.lang.reflect.Constructor.newInstance(Constructor.java:490)\n' 'org.apache.calcite.runtime.Resources$ExInstWithCause.ex(Resources.java:507)\n' 'No match found for function signature LEN(<CHARACTER>)\n' 'jdk.internal.reflect.GeneratedConstructorAccessor73.newInstance(Unknown ' 'Source)\n' 'java.base/jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)\n' 'java.base/java.lang.reflect.Constructor.newInstance(Constructor.java:490)\n' 'org.apache.calcite.runtime.Resources$ExInstWithCause.ex(Resources.java:507)\n'} [SQL: /* {"app": "OpenMetadata", "version": "1.6.1.0"} */ SELECT avg(LEN(CancellationCode)) AS mean, count(CancellationCode) AS "valuesCount", count(DISTINCT CancellationCode) AS "distinctCount", MIN(LEN(CancellationCode)) AS "min", MAX(LEN(CancellationCode)) AS "max", coalesce(SUM(CAST(CASE WHEN (CancellationCode IS NULL) THEN %(param_1)s ELSE %(param_2)s END AS BIGINT)), %(coalesce_1)s) AS "nullCount", STDDEV_POP(LEN(CancellationCode)) AS stddev, SUM(CAST(LEN(CancellationCode) AS BIGINT)) AS "sum" FROM "default"."airlineStats" LIMIT %(param_3)s] [parameters: {'param_1': 1, 'param_2': 0, 'coalesce_1': 0, 'param_3': 1}] (Background on this error at: https://sqlalche.me/e/14/4xp6)

Expected behavior
A clear and concise description of what you expected to happen.

Profier ingestion should work for string columns

Version:

  • OS: [e.g. iOS]
  • Python version:
  • OpenMetadata version: [e.g. 0.8] 1.6.1
  • OpenMetadata Ingestion package version: [e.g. openmetadata-ingestion[docker]==XYZ]

Additional context
Add any other context about the problem here.

@piby180 piby180 linked a pull request Dec 28, 2024 that will close this issue
6 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant