Skip to content

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Python Operator reports an error when connecting to Clickhouse using Clickhouse connect #45226

Closed
2 tasks done
caicancai opened this issue Dec 27, 2024 · 2 comments
Closed
2 tasks done
Labels
area:core kind:bug This is a clearly a bug needs-triage label for new issues that we didn't triage yet

Comments

@caicancai
Copy link
Member

caicancai commented Dec 27, 2024

Apache Airflow version

2.10.4

If "Other Airflow 2 version" selected, which one?

No response

What happened?

def show_tables():
    client = clickhouse_connect.create_client(
        interface="https",
        host='localhost',
        port=9090,
        user='default',
        password='123456',
        ca_cert='/Users/cc.cai/develop/clickhouse/testing/testing-ca.crt'
    )
    result = client.command("select 1")
    print(result)

The above python can be executed locally, but when I use pythonoperator, an error is reported

from airflow import DAG
from airflow.operators.bash import BashOperator
from airflow.operators.python import PythonOperator
from clickhouse_connect import get_client
from dill import settings
from sqlalchemy_utils.types.enriched_datetime.pendulum_datetime import pendulum
import datetime
import clickhouse_connect

def show_tables():
    client = clickhouse_connect.create_client(
        interface="https",
        host='localhost',
        port=9090,
        user='default',
        password='123456',
        ca_cert='/Users/cc.cai/develop/clickhouse/testing/testing-ca.crt'
    )
    result = client.command("select 1")
    print(result)

with DAG(
    dag_id="example_python_operator",
    schedule="0 0 * * *",
    start_date=pendulum.datetime(2021, 1, 1, tz="UTC"),
    catchup=False,
    dagrun_timeout=datetime.timedelta(minutes=60),
    tags=["example", "example2"],
    params={"example_key": "example_value"},
) as dag:
    show_tables_task = PythonOperator(
        task_id='show_tables_task',
        python_callable=show_tables,
        dag=dag
    )

Traceback (most recent call last):

  File "/Users/cc.cai/airflow_venv/lib/python3.10/site-packages/airflow/models/dagbag.py", line 383, in parse
    loader.exec_module(new_module)
  File "<frozen importlib._bootstrap_external>", line 883, in exec_module
  File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed
  File "/Users/cc.cai/airflow_venv/lib/python3.10/site-packages/airflow/example_dags/example_python_operator.py", line 57, in <module>
    show_tables()
  File "/Users/cc.cai/airflow_venv/lib/python3.10/site-packages/airflow/example_dags/example_python_operator.py", line 31, in show_tables
    client = clickhouse_connect.create_client(
  File "/Users/cc.cai/airflow_venv/lib/python3.10/site-packages/clickhouse_connect/driver/__init__.py", line 115, in create_client
    return HttpClient(interface, host, port, username, password, database, settings=settings, **kwargs)
  File "/Users/cc.cai/airflow_venv/lib/python3.10/site-packages/clickhouse_connect/driver/httpclient.py", line 157, in __init__
    super().__init__(database=database,
  File "/Users/cc.cai/airflow_venv/lib/python3.10/site-packages/clickhouse_connect/driver/client.py", line 69, in __init__
    self._init_common_settings(apply_server_timezone)
  File "/Users/cc.cai/airflow_venv/lib/python3.10/site-packages/clickhouse_connect/driver/client.py", line 74, in _init_common_settings
    tuple(self.command('SELECT version(), timezone()', use_database=False))
  File "/Users/cc.cai/airflow_venv/lib/python3.10/site-packages/clickhouse_connect/driver/httpclient.py", line 351, in command
    response = self._raw_request(payload, params, headers, method, fields=fields, server_wait=False)
  File "/Users/cc.cai/airflow_venv/lib/python3.10/site-packages/clickhouse_connect/driver/httpclient.py", line 449, in _raw_request
    raise OperationalError(f'Error {ex} executing HTTP request attempt {attempts}{err_url}') from ex
clickhouse_connect.driver.exceptions.OperationalError: Error HTTPSConnectionPool(host='clickhouse-testing.automizely.me', port=9090): Max retries exceeded with url: /? (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed:

What you think should happen instead?

No response

How to reproduce

def show_tables():
    client = clickhouse_connect.create_client(
        interface="https",
        host='localhost',
        port=9090,
        user='default',
        password='123456',
        ca_cert='/Users/cc.cai/develop/clickhouse/testing/testing-ca.crt'
    )
    result = client.command("select 1")
    print(result)

The above python can be executed locally, but when I use pythonoperator, an error is reported

from airflow import DAG
from airflow.operators.bash import BashOperator
from airflow.operators.python import PythonOperator
from clickhouse_connect import get_client
from dill import settings
from sqlalchemy_utils.types.enriched_datetime.pendulum_datetime import pendulum
import datetime
import clickhouse_connect

def show_tables():
    client = clickhouse_connect.create_client(
        interface="https",
        host='localhost',
        port=9090,
        user='default',
        password='123456',
        ca_cert='/Users/cc.cai/develop/clickhouse/testing/testing-ca.crt'
    )
    result = client.command("select 1")
    print(result)

with DAG(
    dag_id="example_python_operator",
    schedule="0 0 * * *",
    start_date=pendulum.datetime(2021, 1, 1, tz="UTC"),
    catchup=False,
    dagrun_timeout=datetime.timedelta(minutes=60),
    tags=["example", "example2"],
    params={"example_key": "example_value"},
) as dag:
    show_tables_task = PythonOperator(
        task_id='show_tables_task',
        python_callable=show_tables,
        dag=dag
    )

Max retries exceeded with url: /? (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed

Operating System

pythonoperator

Versions of Apache Airflow Providers

No response

Deployment

Official Apache Airflow Helm Chart

Deployment details

No response

Anything else?

No response

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Code of Conduct

@caicancai caicancai added area:core kind:bug This is a clearly a bug needs-triage label for new issues that we didn't triage yet labels Dec 27, 2024
Copy link

boring-cyborg bot commented Dec 27, 2024

Thanks for opening your first issue here! Be sure to follow the issue template! If you are willing to raise PR to address this issue please do so, no need to wait for approval.

@caicancai
Copy link
Member Author

I tried SimplehttpOperator and had the same problem

@apache apache locked and limited conversation to collaborators Dec 27, 2024
@potiuk potiuk converted this issue into discussion #45256 Dec 27, 2024

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

Labels
area:core kind:bug This is a clearly a bug needs-triage label for new issues that we didn't triage yet
Projects
None yet
Development

No branches or pull requests

1 participant