Releases: matthewwardrop/formulaic
v1.1.1
v1.1.0
This is a major feature release that was motivated in many aspects by the migration of statstmodels
from patsy
to formulaic
. Many thanks to @bashtage for driving those invasive changes forward. There are some semantic breaking changes, but unless you are deep in the internals of formulaic
(which I do not believe to be the case for any external library) these are not expected to break common usage.
Breaking changes:
Formula
is no longer always "structured" with special cases to handle the
case where it has no structure. Legacy shims have been added to support old
patterns, withDeprecationWarning
s raised when they are used. It is not
expected to break anyone not explicitly checking whether theFormula.root
is
a list instance (which formerly should have been simply assumed) [it is a now
SimpleFormula
instance that acts like an ordered sequence ofTerm
instances].- The column names associated with categorical factors has changed. Previously,
a prefix was unconditionally added to the level in the column name like
feature[T.A]
, whether nor not the encoding will result in that term acting
as a contrast. Now, in keeping withpatsy
, we only add the prefix if the
categorical factor is encoded with reduced rank. Otherwise,feature[A]
will
be used instead. formulaic.parsers.types.structured
has been promoted to
formulaic.utils.structured
.
New features and enhancements:
Formula
now instantiates toSimpleFormula
orStructuredFormula
, the
latter being a tree-structure ofSimpleFormula
instances (as compared to
List[Term]
) previously. This simplifies various internal logic and makes the
propagation of formula metadata more explicit. (#222)- Added support for restricting the set of features used by the default formula
parser so that libraries can more easily restrict the structure of output
formulae. (#207) dict
andrecarray
types are no associated with thepandas
materializer
by default (rather than raising), simplifying some user workflows. (#225)- Added support for the
.
operator (which is replaced with all variables not
used on the left-hand-side of formulae). (#216) - Added experimental support for nested formulae of form
[ ... ~ ... ]
.
This is useful for (e.g.) generating formulae for IV 2SLS. (#108) - Add support for subsettings
ModelSpec[s]
based on an arbitrary
strictly reducedFormulaSpec
. (#208) - Added
Formula.required_variables
to more easily surface the expected data
requirements of the formula. (#205) - Added support for extracting rows dropped during materialization. (#197)
- Added cubic spline support for cyclic (
cc
) and natural (cr
). See
formulaic.materializers.transforms.cubic_spline.cubic_spline
for
more details. - Added a
lag()
transform. - Constructing
LinearConstraints
can now be done from a list of strings (for
increased parity withpatsy
). (#201) - Categorical factors are now preceded with (e.g.)
T.
when they actully
describe contrasts (i.e. when they are encoded with reduced rank). (#220) - Contrasts metadata is now added to the encoder state via
encode_categorical
;
which is surfaced viaModelSpec.factor_contrasts
. (#204) Operator
instances now receivedcontext
which is optionally specified by
the user during formula parsing, and updated by the parser. This is what makes
the.
implementation possible. (#216)- Given the generic usefulness of
Structured
, it has been promoted to
formulaic.utils
. (#223) - Added explicit support and testing for Python 3.13. (#202)
Bugfixes and cleanups:
- Fixed nested ordering of
Formula
instance. (#200) - Allow Python tokens to multiple chained parentheses and brackets without using
quotes as long as the parentheses are balanced. (#214, #218) - Reduced the number of redundant initialisation operations in
Structured
instances. (#200) - Fixed pickling
ModelMatrix
andFactorValues
instances (whenever wrapped
objects are picklable). (#209; thanks @bashtage) basis_spline
: Fixed evaluation involving datasets with null values, and
disallow out-of-bounds knots. (#217; thanks @bashtage)- Improved robustness of data contexts involving PyArrow datasets.
- We now use the same sentiles throughout the code-base, rather than having
module specific sentinels in some places. - Migrated to
ruff
for linting, and updatedmypy
andpre-commit
tooling. - Automatic fixes from
ruff
are automatically applied when using
hatch run lint:format
.
Documentation:
- Fixed and updated docsite build, as well as other minor tweaks.
v1.0.2
Bugfixes and cleanups:
- Fix compatibility with
pandas
>=3. - Fix
mypy
type inference in materializer subclasses.
Documentation:
- Add column name extraction to
sklearn
integration example. - Add section to allow users to indicate their usage of formulaic.
v1.0.1
This is identical to v1.0.0, but with the package status marked to production/stable rather than beta [facepalm].
v1.0.0
This is the first officially stable release of formulaic, with a relatively small diff from the 0.6.x series.
Breaking changes:
- Python tokens are now canonically formatted (see below).
- Methods deprecated during the 0.x series have been removed:
Formula.terms
,
ModelSpec.feature_names
, andModelSpec.feature_indices
.
New features and enhancements:
- Python tokens are now sanitized and canonically formatted to prevent
ambiguities and better align withpatsy
. - Added official support for Python 3.12 (no code changes were necessary).
- Added the
hashed
transform for categorically encoding deterministically
hashed representations of a dataset. [Contributed by @rishi-kulkarni]
Bugfixes and cleanups:
- Fixed transform state not propagating correctly when Python code tokens were
not canonically formatted. - Literals in formulae will no longer be silently ignored, and feature scaling
is now fully supported. - Improved code parsing and formatting utilities and dropped the requirement for
astor
for Python 3.9 and newer. - Fixed all warnings emitted during unit tests.
Documentation:
- Removed incompleteness warnings.
- Added some lightweight developer documents.
- Fixed some broken links.
v0.6.6
This is minor release with one important bugfix.
Bugfixes and cleanups:
- Fixes a regression introduced by 0.6.4 whereby missing variables will be
silently dropped from the formula., rather than raising an exception.
v0.6.5
This is a minor release with several important bugfixes.
Bugfixes and cleanups:
- Fixed intercept terms sorting after other features (by not counting literal
factors toward the degree of a term). #156 - Fixed a regression in 0.6.4 around quoted field names in Python evaluations. #154
- Fixed detection and dropping of null rows in sparse datasets. #155
- Fixed
poly()
transforms operating on datasets that include null values. #155 - Arguments can now be passed when running the unit tests using
hatch run tests
.
v0.6.4
This is a minor release with several new features and cleanups.
New features and enhancements:
- Added support for keeping track of the source of variables being used to
evaluate a formula. Refer to theModelSpec
documentation for more details.
Bugfixes and cleanups:
- All functions and methods now have type signatures that are statically checked
during unit testing. - Removed
OrderedDict
usage, since Python guarantees the orderedness of
dictionaries in Python 3.7+. - Suppress terms/factors in model matrices for which the factors evaluate to
None
.
v0.6.3
This is a minor release with a bugfix.
Bugfixes and cleanups:
- Fixed a regression introduced in the previous release when materializing categorical encodings of variables with no levels.
v0.6.2
This is a minor release with several bugfixes.
Bugfixes and cleanups:
- Fixed issues handling empty data sets in formulae that used categorical
encoding. - Added the MIT license to distribution classifiers.