Releases: googleapis/python-bigquery-dataframes
Releases · googleapis/python-bigquery-dataframes
v1.8.0
1.8.0 (2024-05-31)
Features
mergeonly generates a default index if both inputs already have an index (#733) (25d049c)- Add
+,-as unary ops,^binary op (#724) (968d825) - Add
GroupBy.size()to get number of rows in each group (#479) (1fca588) - Add DataFrame
~operator (#721) (354abc1) - Add GeminiText 1.5 Preview models (#737) (56cbd3b)
- Add slot_millis and add stats to session object (#725) (72e9583)
- Adds bigframes.bigquery.array_to_string to convert array elements to delimited strings (#731) (f12c906)
- Allow functions decorated with
bpd.remote_function()to execute locally (#704) (d850da6) - Ensure
"bigframes-api"label is always set on jobs, even if the API is unknown (#722) (1832778) - Support
ml.SimpleImputerin bigframes (#708) (4c4415f) - Support type annotations to supply input and output types to
bpd.remote_function()decorator (#717) (4a12e3c) - Support type annotations with
bpd.remote_function()andaxis=1(a preview feature) (#730) (e5a2992)
Bug Fixes
- Correct index labels in multiple aggregations for DataFrameGroupBy (#723) (6a78c89)
- Fix Null index assign series to column (#711) (ffb4b57)
- Set
bpd.remote_function()sinput_typesandoutput_typesdefault toNoneto allow omitting them when type annotations are present (#729) (0e25a3b) - Warn and disable time travel for linked datasets (#712) (085fa9d)
Performance Improvements
Documentation
v1.7.0
1.7.0 (2024-05-20)
Features
read_gbq_querysupportsfilters(9386373)read_gbqsuggests a correct column name when one is not found (9386373)- Add
DefaultIndexKind.NULLto use asindex_colinread_gbq*, creating an indexless DataFrame/Series (#662) (29e4886) - Bigframes.bigquery.array_agg(SeriesGroupBy|DataFrameGroupby) (#663) (412f28b)
- To_datetime supports utc=False for string inputs (#579) (adf9889)
Bug Fixes
read_gbq_tablerespects primary keys even whenfiltersare set (#689) (9386373)- Fix type error in test_cluster (#698) (14d81c1)
- Improve escaping of literals and identifiers (#682) (da9b136)
- Properly identify non-unique index in tables without primary keys (#699) (6e0f4d8)
- Remove a usage of the
resourcepackage when not available, such as on Windows (#681) (96243f2) - The imported samples error and use peek() (#688) (1a0b744)
Performance Improvements
- Don't run query immediately from
read_gbq_tableiffiltersis set (9386373) - Use a
LIMITclause whenmax_resultsis set (9386373)
Documentation
v1.6.0
1.6.0 (2024-05-13)
Features
- Add
DataFrame.__delitem__(#673) (2218c21) - Add
Series.case_when()(#673) (2218c21) - Add
strategy="quantile"in KBinsDiscretizer (#654) (c6c487f) - Add Series.combine (#680) (2fd1b81)
- Series.str.split (#675) (6eb19a7)
- Suggest correct options in bpd.options.bigquery.location (#666) (57ccabc)
- Support
axis=1indf.applyfor scalar outputs (#629) (f6bdc4a) - Support gcf vpc connector in
remote_function(#677) (9ca92d0) - Warn with a more specific
DefaultLocationWarningcategory when no location can be detected (#648) (e084e54)
Bug Fixes
Dependencies
- Add jellyfish as a dependency for spelling correction (57ccabc)
Documentation
v1.5.0
1.5.0 (2024-05-07)
Features
bigframes.optionsandbigframes.option_contextnow uses thread-local variables to prevent context managers in separate threads from affecting each other (#652) (651fd7d)- Add
ARIMAPlus.coef_property exposingML.ARIMA_COEFFICIENTSfunctionality (#585) (81d1262) - Add a unique session_id to Session and allow cleaning up sessions (#553) (c8d4e23)
- Add the
bigframes.bigquerysub-package with abigframes.bigquery.array_lengthfunction (#630) (9963f85) - Always do a query dry run when
option.repr_mode == "deferred"(#652) (651fd7d) - Custom query labels for compute options (#638) (f561799)
- Raise
NoDefaultIndexErrorfromread_gbqon clustered/partitioned tables with noindex_colorfiltersset (#631) (73064dd) - Support
index_col=Falseinread_csvandengine="bigquery"(73064dd) - Support gcf max instance count in
remote_function(#657) (36578ab)
Bug Fixes
- Don't raise UnknownLocationWarning for US or EU multi-regions (#653) (8e4616b)
- Downgrade NoDefaultIndexError to DefaultIndexWarning (#658) (2715d2b)
- Fix bug with na in the column labels in stack (#659) (4a34293)
- Use explicit session in
PaLM2TextGenerator(#651) (e4f13c3)
Documentation
v1.4.0
1.4.0 (2024-04-29)
Features
- Add .cache() method to persist intermediate dataframe (#626) (a5c94ec)
- Add transpose support for small homogeneously typed DataFrames. (#621) (054075d)
- Allow single input type in
remote_function(#641) (3aa643f) - Expose gcf max timeout in
remote_function(#639) (dfeaad0) - Series binary ops compatible with more types (#618) (518d315)
- Support the
scoremethod forPaLM2TextGenerator(#634) (3ffc1d2)
Bug Fixes
- Allow to_pandas to download more than 10GB (#637) (ce56495)
- Extend row hash to 128 bits to guarantee unique row id (#632) (9005c6e)
- Llm fine tuning tests (#627) (4724a1a)
- Llm palm score tests (#643) (cf4ec3a)
Performance Improvements
- Automatically condense internal expression representation (#516) (03c1b0d)
- Cache transpose to allow performant retranspose (#635) (44b738d)
Documentation
v1.3.0
1.3.0 (2024-04-22)
Features
- Add
Series.struct.dtypesproperty (#599) (d924ec2) - Add fine tuning
fit()for Palm2TextGenerator (#616) (9c106bd) - Add quantile statistic (#613) (bc82804)
- Expose
max_batching_rowsinremote_function(#622) (240a1ac) - Support primary key(s) in
read_gbqby using as theindex_colby default (#625) (75bb240) - Warn if location is set to unknown location (#609) (3706b4f)
Bug Fixes
- Address technical writers fb (#611) (9f8f181)
- Infer narrowest numeric type when combining numeric columns (#602) (8f9ece6)
- Use exact median implementation by default (#619) (9d205ae)
Documentation
v1.2.0
1.2.0 (2024-04-15)
Features
- Add hasnans, combine_first, update to Series (#600) (86e0f38)
- Add MultiIndex subclass. (#596) (5d0f149)
- Add pivot_table for DataFrame. (#473) (5f1d670)
- Add Series.autocorr (#605) (4ec8034)
- Support list of numerics in pandas.cut (#580) (290f95d)
Bug Fixes
- Address more technical writers feedback (#581) (4b08d92)
- Error for object dtype on read_pandas (#570) (8702dcf)
- Inverting int now does bitwise inversion rather than sign flip (#574) (5f1db8b)
- Loc setitem dtype issue. (#603) (b94bae9)
- Toc menu missing plotting name (#591) (eed12c1)
Documentation
v1.1.0
1.1.0 (2024-04-04)
Features
- (Series|DataFrame).explode (#556) (9e32f57)
- Add
DataFrame.evalandDataFrame.query(#361) (5e28ebd) - Add ColumnTransformer save/load (#541) (9d8cf67)
- Add ml.metrics.mean_squared_error (#559) (853c25e)
- Add support for numpy expm1, log1p, floor, ceil, arctan2 ops (#505) (e8e66cf)
- Add transformers save/load (#552) (d805241)
- Allow DataFrame binary ops to align on either axis and with loc… (#544) (6d8f3af)
- Expose
DataFrame.bqclientto assist in integrations (#519) (0be8911) - Read_pandas accepts pandas Series and Index objects (#573) (f8821fe)
- Support
ML.GENERATE_EMBEDDINGinPaLM2TextEmbeddingGenerator(#539) (1156c1e) - Support max_columns in repr and make repr more efficient (#515) (54e49cf)
Bug Fixes
- Assign NaN scalar to column error. (#513) (0a4153c)
- Don't download 100gb onto local python machine in load test (#537) (082c58b)
- Exclude list-like s parameter in plot.scatter (#568) (1caac27)
- Fix case where df.peek would fail to execute even with force=True (#511) (8eca99a)
- Fix error in
Series.drop(0)(#575) (75dd786) - Include all names in MultiIndex repr (#564) (b188146)
- Plot.scatter s parameter cannot accept float-like column (#563) (8d39187)
- Product operation produces float result for all input types (#501) (6873b30)
- Reloaded transformer .transform error (#569) (39fe474)
- Rename PaLM2TextEmbeddingGenerator.predict output columns to be backward compatible (#561) (4995c00)
- Respect hard stack size limit and swallow limit change exception. (#558) (4833908)
- Restore string to date/time type coercion (#565) (4ae0262)
- Sync the notebook with embedding changes (#550) (347f2dd)
- Use bytes limit on frame inlining rather than element count (#576) (659a161)
Performance Improvements
Dependencies
Documentation
bigframes.options.bigquery.projectandlocationare optional in some circumstances (#548) (90bcec5)- Add "Supported pandas APIs" reference to the documentation (#542) (74c3915)
- Add General Availability banner to README (#507) (262ff59)
- Add opeartions in API docs (#557) (ea95761)
- Add progress_bar code sample (#508) (92a1af3)
- Add the code samples for metrics{auc, roc_auc_score, roc_curve} (#520) (5f37b09)
- Address more comments from technical writers to meet legal purposes (#571) (9084df3)
- Fix docs of ARIMAPlus.predict (#512) (3b80f95)
- Include Index in table-of-contents (#564) (b188146)
- Mark Gemini model as Pre-GA (#543) (769868b)
- Migrate the overview page to Bigframes official landing page (#536) (a0fb8bb)
v1.0.0
1.0.0 (2024-03-25)
⚠ BREAKING CHANGES
- rename model parameter
min_rel_progresstotol early_stopsetting no longer supported, always usesTrue- rename model parameter
n_parallell_treeston_estimators - rename
class_weightstoclass_weight - rename
learn_ratetolearning_rate - PCA
n_componentssupports float value andNone, default toNone - rename various ml model parameters for consistency with sklearn (#491)
Features
- Add configuration option to read_gbq (#401) (85cede2)
- Add ml ARIMAPlus model params (#488) (352cb85)
- Add ml KMeans model params (#477) (23a8d9a)
- Add ml LogisticRegression model params (#481) (f959b65)
- Add ml PCA model params (#474) (fb5d83b)
- Add params for LinearRegression model (#464) (21b2188)
- Add support for Python 3.12 (#231) (df2976f)
- Allow assigning directly to Series.name property (#495) (ad0e99e)
- Ensure
Series.str.len()can get length of array columns (#497) (10c0446) - Option to use bq connection without check (#460) (0b3f8e5)
- PCA
n_componentssupports float value andNone, default toNone(65c6f47) - Rename
class_weightstoclass_weight(65c6f47) - Rename
learn_ratetolearning_rate(65c6f47) - Rename model parameter
min_rel_progresstotol(65c6f47) - Rename model parameter
n_parallell_treeston_estimators(65c6f47) - Rename various ml model parameters for consistency with sklearn (#491) (65c6f47)
- Support BQ regional endpoints for europe-west9, europe-west3, us-east4, and us-west1 (#504) (fbada4a)
- Support dataframe.cov (#498) (c4beafd)
- Support Series.dt.floor (#493) (2dd01c2)
- Support Series.dt.normalize (#483) (0bf1e91)
- Update plot sample to 1000 rows (#458) (60d4a7b)
Bug Fixes
early_stopsetting no longer supported, always usesTrue(65c6f47)- Fix -1 offset lookups failing (#463) (2dfb9c2)
- Plot.scatter
cargument functionalities (#494) (d6ee994) - Properly support format param for numerical input. (#486) (ae20c35)
- Renable to_csv and to_json related tests (#468) (2b9a01d)
- Sampling plot cannot preserve ordering if index is not ordered (#475) (a5345fe)
- Use actual BigQuery types rather than ibis types in to_pandas (#500) (82b4f91)
Dependencies
Documentation
- Add code samples for metrics.{accuracy_score, confusion_matrix} (#478) (3e3329a)
- Add code samples for metrics.{recall_score, precision_score, f11_score} (#502) (370fe90)
- Improve API documentation (#489) (751266e)
- Update bigquery connection documentation (#499) (4bfe094)
- Update LLM + K-means notebook to handle partial failures (#496) (97afad9)
v0.26.0
0.26.0 (2024-03-20)
⚠ BREAKING CHANGES
- exclude remote models for .register() (#465)
Features
- (Series|DataFrame).plot (#438) (1c3e668)
read_gbq_tablesupportsLIKEas a operator infilters(#454) (d2d425a)- Add DataFrame.pipe() method (#421) (95f5a6e)
- Set
force=Trueby default inDataFrame.peek()(#469) (4e8e97d) - Support datetime related casting in (Series|DataFrame|Index).astype (#442) (fde339b)
- Support Series.dt.strftime (#453) (8f6e955)
Bug Fixes
- Any() on empty set now correctly returns False (#471) (f55680c)
- Df.drop_na preserves columns dtype (#457) (3bab1a9)
- Disable to_json and to_csv related tests (#462) (874026d)
- Exclude remote models for .register() (#465) (73fe0f8)
- Fix broken link in covid notebook (#450) (adadb06)
- Fix broken multiindex loc cases (#467) (b519197)
- Fix grouping series on multiple other series (#455) (3971bd2)
- Groupby aggregates no longer check if grouping keys are numeric (#472) (4fbf938)
- Raise
ValueErrorwhenread_pandas()receives a bigframesDataFrame(#447) (b28f9fd) - Series.(to_csv|to_json) leverages bq export (#452) (718a00c)
- Warn when
read_gbq/read_gbq_tableuses the snapshot time cache (#441) (e16a8c0)
Documentation
- Add code samples for
ml.metrics.r2_score(#459) (85fefa2) - Add the docs for loc and iloc indexers (#446) (14ab8d8)
- Add the pages for at and iat indexers (#456) (340f0b5)
- Add version information to bug template (#437) (91bd39e)
- Indicate that project and location are optional in example notebooks (#451) (1df0140)