python-fastparquet

Edit Package python-fastparquet
No description set
Refresh
Refresh
Source Files
Filename Size Changed
fastparquet-2023.8.0.tar.gz 0028904480 27.6 MB
python-fastparquet.changes 0000019492 19 KB
python-fastparquet.spec 0000003260 3.18 KB
Revision 29 (latest revision is 32)
Ana Guerrero's avatar Ana Guerrero (anag+factory) accepted request 1130498 from Dirk Mueller's avatar Dirk Mueller (dirkmueller) (revision 29)
- update to 2023.8.0:
  * More general timestamp units (#874)
  * ReadTheDocs V2 (#871)
  * Better roundtrip dtypes (#861, 859)
  * No convert when computing bytes-per-item for str (#858)

- Add patch to fox the test test_delta_from_def_2 on
  * row-level filtering of the data. Whereas previously, only full
    row-groups could be excluded on the basis of their parquet
    metadata statistics (if present), filtering can now be done
    within row-groups too. The syntax is the same as before,
    allowing for multiple column expressions to be combined with
    AND|OR, depending on the list structure. This mechanism
    requires two passes: one to load the columns needed to create
    the boolean mask, and another to load the columns actually
    needed in the output. This will not be faster, and may be
    slower, but in some cases can save significant memory
    footprint, if a small fraction of rows are considered good and
    the columns for the filter expression are not in the output.
  * DELTA integer encoding (read-only): experimentally working,
    but we only have one test file to verify against, since it is
    not trivial to persuade Spark to produce files encoded this
    way. DELTA can be extremely compact a representation for
  * nanosecond resolution times: the new extended "logical" types
    system supports nanoseconds alongside the previous millis and
    micros. We now emit these for the default pandas time type,
    and produce full parquet schema including both "converted" and
    "logical" type information. Note that all output has
    isAdjustedToUTC=True, i.e., these are timestamps rather than
    local time. The time-zone is stored in the metadata, as
Comments 0
openSUSE Build Service is sponsored by