Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CI] Migrate to pyproject.toml and use poetry for deterministic builds #201

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
32 changes: 8 additions & 24 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -16,21 +16,6 @@ concurrency:
cancel-in-progress: true

jobs:
checks:
runs-on: ubuntu-latest
steps:
- name: "Checkout the source code"
uses: actions/checkout@v2

- name: "Install Python"
uses: actions/setup-python@v2

- name: "Install pre-commit"
run: pip install pre-commit

- name: "Run pre-commit checks"
run: pre-commit run --all-files
Comment on lines -28 to -32
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We generally keep linting and static analysis jobs separate from jobs which run tests. Makes it easier to see what failed.

It looks like now pre-commit runs as part of the test jobs themselves.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hashhar the reason is to try to adhere to the DRY principle and ensure that local and remote CI are consistent. In this instance rather than explicitly calling out each test/step in the GitHub actions this is now all defined by tox. Any updates to tox.ini will be reflected in the remote CI.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While that is a reasonable benefit it has it's downsides as well - namely that in case of failures you need to go to the CI run logs and see what exactly failed and unless you know what to search/grep for it's painful.

Copy link
Contributor Author

@john-bodley john-bodley Jul 27, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are pros and cons with everything. In general adhering to the KISS principle—simplifying logic and ensure consistency—likely out weights the cons. It's worth noting the errors are present in the logs, albeit maybe having to toggle an additional chevron.


Comment on lines -28 to -33
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If those steps are removed, then I assume checks jobs is redundant. Outstanding jobs step be moved to build job.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

However, it would be nice to still execute pre-commit checks in checks job. Then build could be run only if checks job is successful. It will reduce runners usage.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hovaesco is it necessary to split out the checks vs the build? I agree that one could argue that pre-commit performs pre-flight checks, however i) it does mean the DRY principle isn't really adhered to, and ii) you could argue (per the name) that these should actually be first run locally (alongside all the other local CI checks) at the time of the commit. This is actually the most efficient use of resources and results in faster iteration.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You cannot control behaviour of people who submit PRs. It's still useful to let people submit PRs in an imperfect shape and guide them to the finish line so that their next contribution is easier.

Also keeping the checks split has two benefits:

  • Lesser wall-time for each check.
  • Easy to see at a glance what failed with zero additional clicks (that's also why a matrix was used instead of tox for the version based tests)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with arguments given @hashhar that CI should still have those jobs split out + it will reduce runners usage.

build:
runs-on: ubuntu-latest
strategy:
Expand All @@ -47,15 +32,13 @@ jobs:
trino: [
"latest",
]
sqlalchemy: [
"~=1.4.0"
]
include:
# Test with older Trino versions for backward compatibility
- { python: "3.10", trino: "351", sqlalchemy: "~=1.4.0" } # first Trino version
# Test with sqlalchemy 1.3
- { python: "3.10", trino: "latest", sqlalchemy: "~=1.3.0" }
- { python: "3.10", trino: "351" } # first Trino version
# Test with Trino version that requires result set to be fully exhausted
- { python: "3.10", trino: "395" }
env:
TOX_PARALLEL_NO_SPINNER: 1
TRINO_VERSION: "${{ matrix.trino }}"
steps:
- uses: actions/checkout@v2
Expand All @@ -66,7 +49,8 @@ jobs:
run: |
sudo apt-get update
sudo apt-get install libkrb5-dev
pip install .[tests] sqlalchemy${{ matrix.sqlalchemy }}
- name: Run tests
sudo curl -sSL https://install.python-poetry.org | python3
poetry install --only dev
- name: Run tox
run: |
pytest -s tests/
poetry run tox --parallel
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is --parallel needed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hashhar rather than multiple GitHub jobs this workflow is configured to have only one job and thus from a wall-clock perspective having the relevant tox environments run in parallel is more desirable.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having single job makes it harder to inspect which env is failing without having to go to the actions page, and check the logs. Any way we can preserve current matrix behaviour?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See #201 (comment). I'm really not certain whether the hypothesized concerns are problematic in reality. There's very little friction in introspecting the GitHub workflow logs to determine why CI is failing.

98 changes: 93 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -424,12 +424,12 @@ exits the *with* context and the queries succeed, otherwise

## Improved Python types

If you enable the flag `experimental_python_types`, the client will convert the results of the query to the
If you enable the flag `experimental_python_types`, the client will convert the results of the query to the
corresponding Python types. For example, if the query returns a `DECIMAL` column, the result will be a `Decimal` object.

Limitations of the Python types are described in the
[Python types documentation](https://docs.python.org/3/library/datatypes.html). These limitations will generate an
exception `trino.exceptions.DataError` if the query returns a value that cannot be converted to the corresponding Python
Limitations of the Python types are described in the
[Python types documentation](https://docs.python.org/3/library/datatypes.html). These limitations will generate an
exception `trino.exceptions.DataError` if the query returns a value that cannot be converted to the corresponding Python
type.

```python
Expand Down Expand Up @@ -457,7 +457,95 @@ assert cur.description[0][1] == "timestamp with time zone"

Feel free to create an issue as it makes your request visible to other users and contributors.

## Getting Started With Development

Start by forking the repository and then modify the code in your fork.

Clone the repository and go inside the code directory.

Python dependencies are managed using [Poetry](https://python-poetry.org/) which helps to ensure the project is managed in a deterministic way. Poetry [creates a virtual environment](https://python-poetry.org/docs/managing-environments/) to aid with the process. Poetry should be installed via:

```
$ curl -sSL https://install.python-poetry.org | python3
```

When the code is ready, submit a Pull Request.

### Code Style

- For Python code, adhere to PEP 8.
- Prefer code that is readable over one that is "clever".
- When writing a Git commit message, follow these [guidelines](https://chris.beams.io/posts/git-commit/).

### Running Tests

`trino-python-client` uses [tox](https://tox.wiki/en/latest/)—a tool for standardizing testing in Python—which leverages the [pytest](https://pytest.org/) testing framework. To run
only unit tests, type:

```
$ poetry run tox -e <environment> -- tests/unit
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a way to have reasonable default so that when developing locally contributors don't need to think about what env to use. CI will do the exhaustive tests.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hashhar local development likely wants to run multiple environments, i.e., py39, pre-commit, etc. Personally having used tox for a number of years I have no issue with explicitly invoking the Python specific environment when running the unit/integration tests.

Copy link
Member

@hashhar hashhar Jul 27, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should reduce friction for contributors (another benefit being that everyone running tox will see same env unless they specify explicitly). Generally people shouldn't need to care what env they are testing against unless the issue is env specific (then they can provide their own choice of env).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not really sure explicitly specifying an environment is friction. It's worth noting that local CI likely requires multiple environments, i.e., a common workflow could be to first run tox -e pre-commit to ensure linting, typing, etc. is correct before running the test via tox -e py39 say.

One could argue having a these environments—a subset of which can be run via the -e argument—actually reduces friction as it obfuscates the actual commands from the user, i.e., rather than the user needing to know they need to run the pytest, pre-commit, etc. commands (along with the relevant flags/arguments) they simply invoke the respective environment through the tox CLI which standardizes testing workflows.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps adding some examples like poetry run tox -e py39 -- tests/unit could be a reasonable consensus? Contributors usually follow README first, or we can have a sample local testing workflow described like run pre-commit then running tests etc.

```

Similarly to run only integration tests, type:

```
$ poetry run tox -e <environment> -- tests/integration
```

where `<environment>` denotes the Python environment (see the configuration in `tox.ini`).

Then you can pass options like `--pdb` or anything supported by `pytest --help`.

They pull a Docker image and then run a container with a Trino server:
- the image is named `trinodb/trino:${TRINO_VERSION}`
- the container is named `trino-python-client-tests-{uuid4()[:7]}`

### pre-commit

`trino-python-client` leverages [pre-commit](https://pre-commit.com/) to help identify simple issues before submission to code review. Checks include the validity of the `pyproject.toml` file, type checks via [Mypy](https://github.com/python/mypy), etc. To enable `pre-commit` run:

```
poetry run pre-commit install
```

which will run on every commit. You can also run it anytime using:

```
poetry run tox -e pre-commit
```

### Releasing

- [Set up your development environment](#Getting-Started-With-Development).
- Check the local workspace is up to date and has no uncommitted changes
```bash
git fetch -a && git status
```
- Change version in `trino/pyproject.toml` to a new version, e.g. `0.123.0`.
- Commit
```bash
git commit -a -m "Bump version to 0.123.0"
```
- Create an annotated tag
```bash
git tag -m "" 0.123.0
```
- Create release package and upload it to PyPI
```bash
poetry publish --build &&
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

any possibility of clearing old state/artifacts for safety?

Copy link
Contributor Author

@john-bodley john-bodley Jul 26, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hashhar I'm unsure. There's python-poetry/poetry#1329, however it has been open for some time.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's keep the rm -rf dist/ in the steps - I verified that Poetry indeed doesn't clean up old artifacts.

open https://pypi.org/project/trino/ &&
echo "Released!"
```
- Push the branch and the tag
```bash
git push upstream master 0.123.0
```
- Send release announcement.

## Need Help?

Feel free to create an issue as it make your request visible to other users and contributors.

If an interactive discussion would be better or if you just want to hangout and chat about
the Trino Python client, you can join us on the *#python-client* channel on
[Trino Slack](https://trino.io/slack.html).

Loading