Skip to content

Releases: matthewwardrop/formulaic

v1.1.1

20 Dec 19:43
Compare
Choose a tag to compare

New features and enhancements:

  • Formula.differentiate() is now considered stable, with
    ModelMatrix.differentiate() to follow in a future release. (#236)

Bugfixes and cleanups:

  • Fixed a regression introduced in v1.1.0 regarding ordering of terms in a
    differentiated formula. (#236)

v1.1.0

16 Dec 03:41
Compare
Choose a tag to compare

This is a major feature release that was motivated in many aspects by the migration of statstmodels from patsy to formulaic. Many thanks to @bashtage for driving those invasive changes forward. There are some semantic breaking changes, but unless you are deep in the internals of formulaic (which I do not believe to be the case for any external library) these are not expected to break common usage.

Breaking changes:

  • Formula is no longer always "structured" with special cases to handle the
    case where it has no structure. Legacy shims have been added to support old
    patterns, with DeprecationWarnings raised when they are used. It is not
    expected to break anyone not explicitly checking whether the Formula.root is
    a list instance (which formerly should have been simply assumed) [it is a now
    SimpleFormula instance that acts like an ordered sequence of Term
    instances].
  • The column names associated with categorical factors has changed. Previously,
    a prefix was unconditionally added to the level in the column name like
    feature[T.A], whether nor not the encoding will result in that term acting
    as a contrast. Now, in keeping with patsy, we only add the prefix if the
    categorical factor is encoded with reduced rank. Otherwise, feature[A] will
    be used instead.
  • formulaic.parsers.types.structured has been promoted to
    formulaic.utils.structured.

New features and enhancements:

  • Formula now instantiates to SimpleFormula or StructuredFormula, the
    latter being a tree-structure of SimpleFormula instances (as compared to
    List[Term]) previously. This simplifies various internal logic and makes the
    propagation of formula metadata more explicit. (#222)
  • Added support for restricting the set of features used by the default formula
    parser so that libraries can more easily restrict the structure of output
    formulae. (#207)
  • dict and recarray types are no associated with the pandas materializer
    by default (rather than raising), simplifying some user workflows. (#225)
  • Added support for the . operator (which is replaced with all variables not
    used on the left-hand-side of formulae). (#216)
  • Added experimental support for nested formulae of form [ ... ~ ... ].
    This is useful for (e.g.) generating formulae for IV 2SLS. (#108)
  • Add support for subsettings ModelSpec[s] based on an arbitrary
    strictly reduced FormulaSpec. (#208)
  • Added Formula.required_variables to more easily surface the expected data
    requirements of the formula. (#205)
  • Added support for extracting rows dropped during materialization. (#197)
  • Added cubic spline support for cyclic (cc) and natural (cr). See
    formulaic.materializers.transforms.cubic_spline.cubic_spline for
    more details.
  • Added a lag() transform.
  • Constructing LinearConstraints can now be done from a list of strings (for
    increased parity with patsy). (#201)
  • Categorical factors are now preceded with (e.g.) T. when they actully
    describe contrasts (i.e. when they are encoded with reduced rank). (#220)
  • Contrasts metadata is now added to the encoder state via encode_categorical;
    which is surfaced via ModelSpec.factor_contrasts. (#204)
  • Operator instances now received context which is optionally specified by
    the user during formula parsing, and updated by the parser. This is what makes
    the . implementation possible. (#216)
  • Given the generic usefulness of Structured, it has been promoted to
    formulaic.utils. (#223)
  • Added explicit support and testing for Python 3.13. (#202)

Bugfixes and cleanups:

  • Fixed nested ordering of Formula instance. (#200)
  • Allow Python tokens to multiple chained parentheses and brackets without using
    quotes as long as the parentheses are balanced. (#214, #218)
  • Reduced the number of redundant initialisation operations in Structured
    instances. (#200)
  • Fixed pickling ModelMatrix and FactorValues instances (whenever wrapped
    objects are picklable). (#209; thanks @bashtage)
  • basis_spline: Fixed evaluation involving datasets with null values, and
    disallow out-of-bounds knots. (#217; thanks @bashtage)
  • Improved robustness of data contexts involving PyArrow datasets.
  • We now use the same sentiles throughout the code-base, rather than having
    module specific sentinels in some places.
  • Migrated to ruff for linting, and updated mypy and pre-commit tooling.
  • Automatic fixes from ruff are automatically applied when using
    hatch run lint:format.

Documentation:

  • Fixed and updated docsite build, as well as other minor tweaks.

v1.0.2

12 Jul 19:05
Compare
Choose a tag to compare

Bugfixes and cleanups:

  • Fix compatibility with pandas >=3.
  • Fix mypy type inference in materializer subclasses.

Documentation:

  • Add column name extraction to sklearn integration example.
  • Add section to allow users to indicate their usage of formulaic.

v1.0.1

25 Dec 05:46
Compare
Choose a tag to compare

This is identical to v1.0.0, but with the package status marked to production/stable rather than beta [facepalm].

v1.0.0

25 Dec 05:45
Compare
Choose a tag to compare

This is the first officially stable release of formulaic, with a relatively small diff from the 0.6.x series.

Breaking changes:

  • Python tokens are now canonically formatted (see below).
  • Methods deprecated during the 0.x series have been removed: Formula.terms,
    ModelSpec.feature_names, and ModelSpec.feature_indices.

New features and enhancements:

  • Python tokens are now sanitized and canonically formatted to prevent
    ambiguities and better align with patsy.
  • Added official support for Python 3.12 (no code changes were necessary).
  • Added the hashed transform for categorically encoding deterministically
    hashed representations of a dataset. [Contributed by @rishi-kulkarni]

Bugfixes and cleanups:

  • Fixed transform state not propagating correctly when Python code tokens were
    not canonically formatted.
  • Literals in formulae will no longer be silently ignored, and feature scaling
    is now fully supported.
  • Improved code parsing and formatting utilities and dropped the requirement for
    astor for Python 3.9 and newer.
  • Fixed all warnings emitted during unit tests.

Documentation:

  • Removed incompleteness warnings.
  • Added some lightweight developer documents.
  • Fixed some broken links.

v0.6.6

04 Oct 21:48
Compare
Choose a tag to compare

This is minor release with one important bugfix.

Bugfixes and cleanups:

  • Fixes a regression introduced by 0.6.4 whereby missing variables will be
    silently dropped from the formula., rather than raising an exception.

v0.6.5

25 Sep 23:42
Compare
Choose a tag to compare

This is a minor release with several important bugfixes.

Bugfixes and cleanups:

  • Fixed intercept terms sorting after other features (by not counting literal
    factors toward the degree of a term). #156
  • Fixed a regression in 0.6.4 around quoted field names in Python evaluations. #154
  • Fixed detection and dropping of null rows in sparse datasets. #155
  • Fixed poly() transforms operating on datasets that include null values. #155
  • Arguments can now be passed when running the unit tests using hatch run tests.

v0.6.4

11 Jul 04:27
Compare
Choose a tag to compare

This is a minor release with several new features and cleanups.

New features and enhancements:

  • Added support for keeping track of the source of variables being used to
    evaluate a formula. Refer to the ModelSpec documentation for more details.

Bugfixes and cleanups:

  • All functions and methods now have type signatures that are statically checked
    during unit testing.
  • Removed OrderedDict usage, since Python guarantees the orderedness of
    dictionaries in Python 3.7+.
  • Suppress terms/factors in model matrices for which the factors evaluate to
    None.

v0.6.3

28 Jun 18:11
Compare
Choose a tag to compare

This is a minor release with a bugfix.

Bugfixes and cleanups:

  • Fixed a regression introduced in the previous release when materializing categorical encodings of variables with no levels.

v0.6.2

22 Jun 19:34
Compare
Choose a tag to compare

This is a minor release with several bugfixes.

Bugfixes and cleanups:

  • Fixed issues handling empty data sets in formulae that used categorical
    encoding.
  • Added the MIT license to distribution classifiers.