Revisions of python-dask

Stephan Kulow's avatar Stephan Kulow (coolo) accepted request 677653 from Tomáš Chvátal's avatar Tomáš Chvátal (scarabeus_iv) (revision 17)
- Enable tests and switch to multibuild
Dominique Leuenberger's avatar Dominique Leuenberger (dimstar_suse) accepted request 603174 from Tomáš Chvátal's avatar Tomáš Chvátal (scarabeus_iv) (revision 2)
- update to version 0.17.2:
  * Array
    + Add broadcast_arrays for Dask Arrays (:pr:`3217`) John A Kirkham
    + Add bitwise_* ufuncs (:pr:`3219`) John A Kirkham
    + Add optional axis argument to squeeze (:pr:`3261`) John A
      Kirkham
    + Validate inputs to atop (:pr:`3307`) Matthew Rocklin
    + Avoid calls to astype in concatenate if all parts have the same
      dtype (:pr:`3301`) `Martin Durant`_
  * DataFrame
    + Fixed bug in shuffle due to aggressive truncation (:pr:`3201`)
      Matthew Rocklin
    + Support specifying categorical columns on read_parquet with
      categories=[…] for engine="pyarrow" (:pr:`3177`) Uwe Korn
    + Add dd.tseries.Resampler.agg (:pr:`3202`) Richard Postelnik
    + Support operations that mix dataframes and arrays (:pr:`3230`)
      Matthew Rocklin
    + Support extra Scalar and Delayed args in
      dd.groupby._Groupby.apply (:pr:`3256`) Gabriele Lanaro
  * Bag
    + Support joining against single-partitioned bags and delayed
      objects (:pr:`3254`) Matthew Rocklin
  * Core
    + Fixed bug when using unexpected but hashable types for keys
      (:pr:`3238`) Daniel Collins
    + Fix bug in task ordering so that we break ties consistently with
      the key name (:pr:`3271`) Matthew Rocklin
    + Avoid sorting tasks in order when the number of tasks is very
      large (:pr:`3298`) Matthew Rocklin
Dominique Leuenberger's avatar Dominique Leuenberger (dimstar_suse) accepted request 582171 from Sebastian Wagner's avatar Sebastian Wagner (sebix) (revision 1)
- correctly package bytecode
- use %license macro
- update to version 0.17.1:
  * Array
    + Corrected dimension chunking in indices (:issue:`3166`,
      :pr:`3167`) Simon Perkins
    + Inline store_chunk calls for store's return_stored option
      (:pr:`3153`) John A Kirkham
    + Compatibility with struct dtypes for NumPy 1.14.1 release
      (:pr:`3187`) Matthew Rocklin
  * DataFrame
    + Bugfix to allow column assignment of pandas
      datetimes(:pr:`3164`) Max Epstein
  * Core
    + New file-system for HTTP(S), allowing direct loading from
      specific URLs (:pr:`3160`) `Martin Durant`_
    + Fix bug when tokenizing partials with no keywords (:pr:`3191`)
      Matthew Rocklin
    + Use more recent LZ4 API (:pr:`3157`) `Thrasibule`_
    + Introduce output stream parameter for progress bar (:pr:`3185`)
      `Dieter Weber`_
- update to version 0.17.0:
  * Array
    + Added a support object-type arrays for nansum, nanmin, and
      nanmax (:issue:`3133`) Keisuke Fujii
    + Update error handling when len is called with empty chunks
      (:issue:`3058`) Xander Johnson
    + Fixes a metadata bug with store's return_stored option
      (:pr:`3064`) John A Kirkham
    + Fix a bug in optimization.fuse_slice to properly handle when
      first input is None (:pr:`3076`) James Bourbeau
    + Support arrays with unknown chunk sizes in percentile
      (:pr:`3107`) Matthew Rocklin
    + Tokenize scipy.sparse arrays and np.matrix (:pr:`3060`) Roman
      Yurchak
  * DataFrame
    + Support month timedeltas in repartition(freq=...) (:pr:`3110`)
      Matthew Rocklin
    + Avoid mutation in dataframe groupby tests (:pr:`3118`) Matthew
      Rocklin
    + read_csv, read_table, and read_parquet accept iterables of paths
      (:pr:`3124`) Jim Crist
    + Deprecates the dd.to_delayed function in favor of the existing
      method (:pr:`3126`) Jim Crist
    + Return dask.arrays from df.map_partitions calls when the UDF
      returns a numpy array (:pr:`3147`) Matthew Rocklin
    + Change handling of columns and index in dd.read_parquet to be
      more consistent, especially in handling of multi-indices
      (:pr:`3149`) Jim Crist
    + fastparquet append=True allowed to create new dataset
      (:pr:`3097`) `Martin Durant`_
    + dtype rationalization for sql queries (:pr:`3100`) `Martin
      Durant`_
  * Bag
    + Document bag.map_paritions function may recieve either a list or
      generator. (:pr:`3150`) Nir
  * Core
    + Change default task ordering to prefer nodes with few dependents
      and then many downstream dependencies (:pr:`3056`) Matthew
      Rocklin
    + Add color= option to visualize to color by task order
      (:pr:`3057`) (:pr:`3122`) Matthew Rocklin
    + Deprecate dask.bytes.open_text_files (:pr:`3077`) Jim Crist
    + Remove short-circuit hdfs reads handling due to maintenance
      costs. May be re-added in a more robust manner later
      (:pr:`3079`) Jim Crist
    + Add dask.base.optimize for optimizing multiple collections
      without computing. (:pr:`3071`) Jim Crist
    + Rename dask.optimize module to dask.optimization (:pr:`3071`)
      Jim Crist
    + Change task ordering to do a full traversal (:pr:`3066`) Matthew
      Rocklin
    + Adds an optimize_graph keyword to all to_delayed methods to
      allow controlling whether optimizations occur on
      conversion. (:pr:`3126`) Jim Crist
    + Support using pyarrow for hdfs integration (:pr:`3123`) Jim
      Crist
    + Move HDFS integration and tests into dask repo (:pr:`3083`) Jim
      Crist
    + Remove write_bytes (:pr:`3116`) Jim Crist
- specfile:
  * update copyright year
- update to version 0.16.1:
  * Array
    + Fix handling of scalar percentile values in "percentile"
      (:pr:`3021`) `James Bourbeau`_
    + Prevent "bool()" coercion from calling compute (:pr:`2958`)
      `Albert DeFusco`_
    + Add "matmul" (:pr:`2904`) `John A Kirkham`_
    + Support N-D arrays with "matmul" (:pr:`2909`) `John A Kirkham`_
    + Add "vdot" (:pr:`2910`) `John A Kirkham`_
    + Explicit "chunks" argument for "broadcast_to" (:pr:`2943`)
      `Stephan Hoyer`_
    + Add "meshgrid" (:pr:`2938`) `John A Kirkham`_ and (:pr:`3001`)
      `Markus Gonser`_
    + Preserve singleton chunks in "fftshift"/"ifftshift" (:pr:`2733`)
      `John A Kirkham`_
    + Fix handling of negative indexes in "vindex" and raise errors
      for out of bounds indexes (:pr:`2967`) `Stephan Hoyer`_
    + Add "flip", "flipud", "fliplr" (:pr:`2954`) `John A Kirkham`_
    + Add "float_power" ufunc (:pr:`2962`) (:pr:`2969`) `John A
      Kirkham`_
    + Compatability for changes to structured arrays in the upcoming
      NumPy 1.14 release (:pr:`2964`) `Tom Augspurger`_
    + Add "block" (:pr:`2650`) `John A Kirkham`_
    + Add "frompyfunc" (:pr:`3030`) `Jim Crist`_
  * DataFrame
    + Fixed naming bug in cumulative aggregations (:issue:`3037`)
      `Martijn Arts`_
    + Fixed "dd.read_csv" when "names" is given but "header" is not
      set to "None" (:issue:`2976`) `Martijn Arts`_
    + Fixed "dd.read_csv" so that passing instances of
      "CategoricalDtype" in "dtype" will result in known categoricals
      (:pr:`2997`) `Tom Augspurger`_
    + Prevent "bool()" coercion from calling compute (:pr:`2958`)
      `Albert DeFusco`_
    + "DataFrame.read_sql()" (:pr:`2928`) to an empty database tables
      returns an empty dask dataframe `Apostolos Vlachopoulos`_
    + Compatability for reading Parquet files written by PyArrow 0.8.0
      (:pr:`2973`) `Tom Augspurger`_
    + Correctly handle the column name (`df.columns.name`) when
      reading in "dd.read_parquet" (:pr:2973`) `Tom Augspurger`_
    + Fixed "dd.concat" losing the index dtype when the data contained
      a categorical (:issue:`2932`) `Tom Augspurger`_
    + Add "dd.Series.rename" (:pr:`3027`) `Jim Crist`_
    + "DataFrame.merge()" (:pr:`2960`) now supports merging on a
      combination of columns and the index `Jon Mease`_
    + Removed the deprecated "dd.rolling*" methods, in preperation for
      their removal in the next pandas release (:pr:`2995`) `Tom
      Augspurger`_
    + Fix metadata inference bug in which single-partition series were
      mistakenly special cased (:pr:`3035`) `Jim Crist`_
    + Add support for "Series.str.cat" (:pr:`3028`) `Jim Crist`_
  * Core
    + Improve 32-bit compatibility (:pr:`2937`) `Matthew Rocklin`_
    + Change task prioritization to avoid upwards branching
      (:pr:`3017`) `Matthew Rocklin`_
- update to version 0.16.0:
  * Fix install of fastparquet on travis (#2897)
  * Fix port for bokeh dashboard (#2889)
  * fix hdfs3 version
  * Modify hdfs import to point to hdfs3 (#2894)
  * Explicitly pass in pyarrow filesystem for parquet (#2881)
  * COMPAT: Ensure lists for multiple groupby keys (#2892)
  * Avoid list index error in repartition_freq (#2873)
  * Finish moving `infer_storage_options` (#2886)
  * Support arrow in `to_parquet`. Several other parquet
    cleanups. (#2868)
  * Bugfix: Filesystem object not passed to pyarrow reader (#2527)
  * Fix py34 build
  * Fixup s3 tests (#2875)
  * Close resource profiler process on __exit__ (#2871)
  * Add changelog for to_parquet changes. [ci skip]
  * A few parquet cleanups (#2867)
  * Fixed fillna with Series (#2810)
  * Error nicely on parse dates failure in read_csv (#2863)
  * Fix empty dataframe partitioning for numpy 1.10.4 (#2862)
  * Test `unique`'s inverse mapping's shape (#2857)
  * Move `thread_state` out of the top namespace (#2858)
  * Explain unique's steps (#2856)
  * fix and test for issue #2811 (#2818)
  * Minor tweaks to `_unique_internal` optional result handling
    (#2855)
  * Update dask interface during XArray integration (#2847)
  * Remove unnecessary map_partitions in aggregate (#2712)
  * Simplify `_unique_internal` (#2850)
  * Add more tests for read_parquet(engine='pyarrow') (#2822)
  * Do not raise exception when calling set_index on empty dataframe
    #2819 (#2827)
  * Test unique on more data (#2846)
  * Do not except on set_index on text column with empty partitions
    #2820 (#2831)
  * Compat for bokeh 0.12.10 (#2844)
  * Support `return_*` arguments with `unique` (#2779)
  * Fix installing of pandas dev (#2838)
  * Squash a few warnings in dask.array (#2833)
  * Array optimizations don't elide some getter calls (#2826)
  * test against pandas rc (#2814)
  * df.astype(categorical_dtype) -> known categoricals (#2835)
  * Fix cloudpickle test (#2836)
  * BUG: Quantile with missing data (#2791)
  * API: remove dask.async (#2828)
  * Adds comma to flake8 section in setup.cfg (#2817)
  * Adds asarray and asanyarray to the dask.array public API (#2787)
  * flake8 now checks bare excepts (#2816)
  * CI: Update for new flake8 / pycodestyle (#2808)
  * Fix concat series bug (#2800)
  * Typo in the docstring of read_parquet's filters param (#2806)
  * Docs update (#2803)
  * minor doc changes in bag.core (#2797)
  * da.random.choice works with array args (#2781)
  * Support broadcasting 0-length dimensions (#2784)
  * ResourceProfiler plot works with single point (#2778)
  * Implement Dask Array's unique to be lazy (#2775)
  * Dask Collection Interface
  * Reduce test memory usage (#2782)
  * Deprecate vnorm (#2773)
  * add auto-import of gcsfs (#2776)
  * Add allclose (#2771)
  * Remove `random.different_seeds` from API docs (#2772)
  * Follow-up for atleast_nd (#2765)
  * Use get_worker().client.get if available (#2762)
  * Link PR for "Allow tuples as sharedict keys" (#2766)
  * Allow tuples as sharedict keys (#2763)
  * update docs to use flatten vs concat (#2764)
  * Add atleast_nd functions (#2760)
  * Consolidate changelog for 0.15.4 (#2759)
  * Add changelog template for future date (#2758)
- update to version 0.15.4:
  * Drop s3fs requirement (#2750)
  * Support -1 as an alias for dimension size in chunks (#2749)
  * Handle zero dimension when rechunking (#2747)
  * Pandas 0.21 compatability (#2737)
  * API: Add `.str` accessor for Categorical with object dtype (#2743)
  * Fix install failures
  * Reduce memory usage
  * A few test cleanups
  * Fix #2720 (#2729)
  * Pass on file_scheme to fastparquet (#2714)
  * Support indexing with np.int (#2719)
  * Tree reduction support for dask.bag.Bag.foldby (#2710)
  * Update link to IPython parallel docs (#2715)
  * Call mkdir from correct namespace in array.to_npy_stack. (#2709)
  * add int96 times to parquet writer (#2711)
- update to version 0.15.3:
  * add .github/PULL_REQUEST_TEMPLATE.md file
  * Make `y` optional in dask.array.learn (#2701)
  * Add apply_over_axes (#2702)
  * Use apply_along_axis name in Dask (#2704)
  * Tweak apply_along_axis's pre-NumPy 1.13.0 error (#2703)
  * Add apply_along_axis (#2698)
  * Use travis conditional builds (#2697)
  * Skip days in daily_stock that have nan values (#2693)
  * TST: Have array assert_eq check scalars (#2681)
  * Add schema keyword to read_sql (#2582)
  * Only install pytest-runner if needed (#2692)
  * Remove resize tool from bokeh plots (#2688)
  * Add ptp (#2691)
  * Catch warning from numpy in subs (#2457)
  * Publish Series methods in dataframe api (#2686)
  * Fix norm keepdims (#2683)
  * Dask array slicing with boolean arrays (#2658)
  * repartition works with mixed categoricals (#2676)
  * Merge pull request #2667 from martindurant/parquet_file_schema
  * Fix for parquet file schemes
  * Optional axis argument for cumulative functions (#2664)
  * Remove partial_by_order
  * Support literals in atop
  * [ci skip] Add flake8 note in developer doc page (#2662)
  * Add filenames return for ddf.to_csv and bag.to_textfiles as they
    both… (#2655)
  * CLN: Remove redundant code, fix typos (#2652)
  * [docs] company name change from Continuum to Anaconda (#2660)
  * Fix what hapend when combining partition_on and append in
    to_parquet (#2645)
  * WIP: Add user defined aggregations (#2344)
  * [docs] new cheatsheet (#2649)
  * Masked arrays (#2301)
  * Indexing with an unsigned integer array (#2647)
  * ENH: Allow the groupby by param to handle columns and index levels
    (#2636)
  * update copyright date (#2642)
  * python setup.py test runs py.test (#2641)
  * Avoid using operator.itemgetter in dask.dataframe (#2638)
  * Add `*_like` array creation functions (#2640)
  * Consistent slicing names (#2601)
  * Replace Continuum Analytics with Anaconda Inc. (#2631)
  * Implement Series.str[index] (#2634)
  * Support complex data with vnorm (#2621)
- changes from version 0.15.2:
  * BUG: setitem should update divisions (#2622)
  * Allow dataframe.loc with numpy array (#2615)
  * Add link to Stack Overflow's mcve docpage to support docs (#2612)
  * Improve dtype inference and reflection (#2571)
  * Add ediff1d (#2609)
  * Optimize concatenate on singleton sequences (#2610)
  * Add diff (#2607)
  * Document norm in Dask Array API (#2605)
  * Add norm (#2597)
  * Don't check for memory leaks in distributed tests (#2603)
  * Include computed collection within sharedict in delayed (#2583)
  * Reorg array (#2595)
  * Remove `expand` parameter from df.str.split (#2593)
  * Normalize `meta` on call to `dd.from_delayed` (#2591)
  * Remove bare `except:` blocks and test that none exist. (#2590)
  * Adds choose method to dask.array.Array (#2584)
  * Generalize vindex in dask.array (#2573)
  * Clear `_cached_keys` on name change in dask.array (#2572)
  * Don't render None for unknown divisions (#2570)
  * Add missing initialization to CacheProfiler (#2550)
  * Add argwhere, *nonzero, where (cond) (#2539)
  * Fix indices error message (#2565)
  * Fix and secure some references (#2563)
  * Allows for read_hdf to accept an iterable of files (#2547)
  * Allow split on rechunk on first pass (#2560)
  * Improvements to dask.array.where (#2549)
  * Adds isin method to dask.dataframe.DataFrame (#2558)
  * Support dask array conditional in compress (#2555)
  * Clarify ResourceProfiler docstring [ci skip] (#2553)
  * In compress, use Dask to expand condition array (#2545)
  * Support compress with axis as None (#2541)
  * df.idxmax/df.idxmin work with empty partitions (#2542)
  * FIX typo in accumulate docstring (#2552)
  * da.where works with non-bool condition (#2543)
  * da.repeat works with negative axis (#2544)
  * Check metadata in `dd.from_delayed` (#2534)
  * TST: clean up test directories in shuffle (#2535)
  * Do no attemp to compute divisions on empty dataframe. (#2529)
  * Remove deprecated bag behavior (#2525)
  * Updates read_hdf docstring (#2518)
  * Add dd.to_timedelta (#2523)
  * Better error message for read_csv (#2522)
  * Remove spurious keys from map_overlap graph (#2520)
  * Do not compare x.dim with None in array. (#1847)
  * Support concat for categorical MultiIndex (#2514)
  * Support for callables in df.assign (#2513)
- Implement single-spec version
- Update source URL.
- Split classes into own subpackages to lighten base dependencies.
- Update to version 0.15.1
  *  Add storage_options to to_textfiles and to_csv (:pr:`2466`)
  *  Rechunk and simplify rfftfreq (:pr:`2473`), (:pr:`2475`)
  *  Better support ndarray subclasses (:pr:`2486`)
  *  Import star in dask.distributed (:pr:`2503`)
  *  Threadsafe cache handling with tokenization (:pr:`2511`)
- Update to version 0.15.0
  + Array
    *  Add dask.array.stats submodule (:pr:`2269`)
    *  Support ``ufunc.outer`` (:pr:`2345`)
    *  Optimize fancy indexing by reducing graph overhead (:pr:`2333`) (:pr:`2394`)
    *  Faster array tokenization using alternative hashes (:pr:`2377`)
    *  Added the matmul ``@`` operator (:pr:`2349`)
    *  Improved coverage of the ``numpy.fft`` module (:pr:`2320`) (:pr:`2322`) (:pr:`2327`) (:pr:`2323`)
    *  Support NumPy's ``__array_ufunc__`` protocol (:pr:`2438`)
  + Bag
    *  Fix bug where reductions on bags with no partitions would fail (:pr:`2324`)
    *  Add broadcasting and variadic ``db.map`` top-level function.  Also remove
       auto-expansion of tuples as map arguments (:pr:`2339`)
    *  Rename ``Bag.concat`` to ``Bag.flatten`` (:pr:`2402`)
  + DataFrame
    *  Parquet improvements (:pr:`2277`) (:pr:`2422`)
  + Core
    *  Move dask.async module to dask.local (:pr:`2318`)
    *  Support callbacks with nested scheduler calls (:pr:`2397`)
    *  Support pathlib.Path objects as uris  (:pr:`2310`)
- Update to version 0.14.3
  + DataFrame
    * Pandas 0.20.0 support
- Update to version 0.14.2
  + Array
    * Add da.indices (:pr:`2268`), da.tile (:pr:`2153`), da.roll (:pr:`2135`)
    * Simultaneously support drop_axis and new_axis in da.map_blocks (:pr:`2264`)
    * Rechunk and concatenate work with unknown chunksizes (:pr:`2235`) and (:pr:`2251`)
    * Support non-numpy container arrays, notably sparse arrays (:pr:`2234`)
    * Tensordot contracts over multiple axes (:pr:`2186`)
    * Allow delayed targets in da.store (:pr:`2181`)
    * Support interactions against lists and tuples (:pr:`2148`)
    * Constructor plugins for debugging (:pr:`2142`)
    * Multi-dimensional FFTs (single chunk) (:pr:`2116`)
  + Bag
    * to_dataframe enforces consistent types (:pr:`2199`)
  + DataFrame
    * Set_index always fully sorts the index (:pr:`2290`)
    * Support compatibility with pandas 0.20.0 (:pr:`2249`), (:pr:`2248`), and (:pr:`2246`)
    * Support Arrow Parquet reader (:pr:`2223`)
    * Time-based rolling windows (:pr:`2198`)
    * Repartition can now create more partitions, not just less (:pr:`2168`)
  + Core
    * Always use absolute paths when on POSIX file system (:pr:`2263`)
    * Support user provided graph optimizations (:pr:`2219`)
    * Refactor path handling (:pr:`2207`)
    * Improve fusion performance (:pr:`2129`), (:pr:`2131`), and (:pr:`2112`)
- Update to version 0.14.1
  + Array
    * Micro-optimize optimizations (:pr:`2058`)
    * Change slicing optimizations to avoid fusing raw numpy arrays (:pr:`2075`)
      (:pr:`2080`)
    * Dask.array operations now work on numpy arrays (:pr:`2079`)
    * Reshape now works in a much broader set of cases (:pr:`2089`)
    * Support deepcopy python protocol (:pr:`2090`)
    * Allow user-provided FFT implementations in ``da.fft`` (:pr:`2093`)
  + Bag
  + DataFrame
    * Fix to_parquet with empty partitions (:pr:`2020`)
    * Optional ``npartitions='auto'`` mode in ``set_index`` (:pr:`2025`)
    * Optimize shuffle performance (:pr:`2032`)
    * Support efficient repartitioning along time windows like
  ``repartition(freq='12h')`` (:pr:`2059`)
    * Improve speed of categorize (:pr:`2010`)
    * Support single-row dataframe arithmetic (:pr:`2085`)
    * Automatically avoid shuffle when setting index with a sorted column
      (:pr:`2091`)
    * Improve handling of integer-na handling in read_csv (:pr:`2098`)
  + Delayed
    * Repeated attribute access on delayed objects uses the same key (:pr:`2084`)
  + Core
    *  Improve naming of nodes in dot visuals to avoid generic ``apply``
       (:pr:`2070`)
    *  Ensure that worker processes have different random seeds (:pr:`2094`)
- Update to version 0.14.0
  + Array
    * Fix corner cases with zero shape and misaligned values in ``arange``
    * Improve concatenation efficiency (:pr:`1923`)
    * Avoid hashing in ``from_array`` if name is provided (:pr:`1972`)
  + Bag
    * Repartition can now increase number of partitions (:pr:`1934`)
    * Fix bugs in some reductions with empty partitions (:pr:`1939`), (:pr:`1950`),
  (:pr:`1953`)
  + DataFrame
    * Support non-uniform categoricals (:pr:`1877`), (:pr:`1930`)
    * Groupby cumulative reductions (:pr:`1909`)
    * DataFrame.loc indexing now supports lists (:pr:`1913`)
    * Improve multi-level groupbys (:pr:`1914`)
    * Improved HTML and string repr for DataFrames (:pr:`1637`)
    * Parquet append (:pr:`1940`)
    * Add ``dd.demo.daily_stock`` function for teaching (:pr:`1992`)
  + Delayed
    * Add ``traverse=`` keyword to delayed to optionally avoid traversing nested
      data structures (:pr:`1899`)
    * Support Futures in from_delayed functions (:pr:`1961`)
    * Improve serialization of decorated delayed functions (:pr:`1969`)
  + Core
    * Improve windows path parsing in corner cases (:pr:`1910`)
    * Rename tasks when fusing (:pr:`1919`)
    * Add top level ``persist`` function (:pr:`1927`)
    * Propagate ``errors=`` keyword in byte handling (:pr:`1954`)
    * Dask.compute traverses Python collections (:pr:`1975`)
    * Structural sharing between graphs in dask.array and dask.delayed (:pr:`1985`)
- Update to version 0.13.0
  + Array
    * Mandatory dtypes on dask.array.  All operations maintain dtype information
      and UDF functions like map_blocks now require a dtype= keyword if it can not
      be inferred.  (:pr:`1755`)
    * Support arrays without known shapes, such as arises when slicing arrays with
      arrays or converting dataframes to arrays (:pr:`1838`)
    * Support mutation by setting one array with another (:pr:`1840`)
    * Tree reductions for covariance and correlations.  (:pr:`1758`)
    * Add SerializableLock for better use with distributed scheduling (:pr:`1766`)
    * Improved atop support (:pr:`1800`)
    * Rechunk optimization (:pr:`1737`), (:pr:`1827`)
  + Bag
    * Avoid wrong results when recomputing the same groupby twice (:pr:`1867`)
  + DataFrame
    * Add ``map_overlap`` for custom rolling operations (:pr:`1769`)
    * Add ``shift`` (:pr:`1773`)
    * Add Parquet support (:pr:`1782`) (:pr:`1792`) (:pr:`1810`), (:pr:`1843`),
      (:pr:`1859`), (:pr:`1863`)
    * Add missing methods combine, abs, autocorr, sem, nsmallest, first, last,
      prod, (:pr:`1787`)
    * Approximate nunique (:pr:`1807`), (:pr:`1824`)
    * Reductions with multiple output partitions (for operations like
      drop_duplicates) (:pr:`1808`), (:pr:`1823`) (:pr:`1828`)
    * Add delitem and copy to DataFrames, increasing mutation support (:pr:`1858`)
  + Delayed
    * Changed behaviour for ``delayed(nout=0)`` and ``delayed(nout=1)``:
    ``delayed(nout=1)`` does not default to ``out=None`` anymore, and
    ``delayed(nout=0)`` is also enabled. I.e. functions with return
    tuples of length 1 or 0 can be handled correctly. This is especially
    handy, if functions with a variable amount of outputs are wrapped by
    ``delayed``. E.g. a trivial example:
    ``delayed(lambda *args: args, nout=len(vals))(*vals)``
  + Core
    * Refactor core byte ingest (:pr:`1768`), (:pr:`1774`)
    * Improve import time (:pr:`1833`)
- update to version 0.12.0:
  * update changelog (#1757)
  * Avoids spurious warning message in concatenate (#1752)
  * CLN: cleanup dd.multi (#1728)
  * ENH: da.ufuncs now supports DataFrame/Series (#1669)
  * Faster array slicing (#1731)
  * Avoid calling list on partitions (#1747)
  * Fix slicing error with None and ints (#1743)
  * Add da.repeat (#1702)
  * ENH: add dd.DataFrame.resample (#1741)
  * Unify column names in dd.read_csv (#1740)
  * replace empty with random in test to avoid nans
  * Update diagnostics plots (#1736)
  * Allow atop to change chunk shape (#1716)
  * ENH: DataFrame.loc now supports 2d indexing (#1726)
  * Correct shape when indexing with Ellipsis and None
  * ENH: Add DataFrame.pivot_table (#1729)
  * CLN: cleanup DataFrame class handling (#1727)
  * ENH: Add DataFrame.combine_first (#1725)
  * ENH: Add DataFrame all/any (#1724)
  * micro-optimize _deps (#1722)
  * A few small tweaks to da.Array.astype (#1721)
  * BUG: Fixed metadata lookup failure in Accessor (#1706)
  * Support auto-rechunking in stack and concatenate (#1717)
  * Forward `get` kwarg in df.to_csv (#1715)
  * Add rename support for multi-level columns (#1712)
  * Update paid support section
  * Add `drop` to reset_index (#1711)
  * Cull dask.arrays on slicing (#1709)
  * Update dd.read_* functions in docs
  * WIP: Feature/dataframe aggregate (implements #1619) (#1678)
  * Add da.round (#1708)
  * Executor -> Client
  * Add support of getitem for multilevel columns (#1697)
  * Prepend optimization keywords with name of optimization (#1690)
  * Add dd.read_table (#1682)
  * Fix dd.pivot_table dtype to be deterministic (#1693)
  * da.random with state is consistent across sizes (#1687)
  * Remove `raises`, use pytest.raises instead (#1679)
  * Remove unnecessary calls to list (#1681)
  * Dataframe tree reductions (#1663)
  * Add global optimizations to compute (#1675)
  * TST: rename dataframe eq to assert_eq (#1674)
  * ENH: Add DataFrame/Series.align (#1668)
  * CLN: dataframe.io (#1664)
  * ENH: Add DataFrame/Series clip_xxx (#1667)
  * Clear divisions on single_partitions_merge (#1666)
  * ENH: add dd.pivot_table (#1665)
  * Typo in `use-cases`? (#1670)
  * add distributed follow link doc page
  * Dataframe elemwise (#1660)
  * Windows file and endline test handling (#1661)
  * remove old badges
  * Fix #1656: failures when parallel testing (#1657)
  * Remove use of multiprocessing.Manager (#1652) (#1653)
  * A few fixes for `map_blocks` (#1654)
  * Automatically expand chunking in atop (#1644)
  * Add AppVeyor configuration (#1648)
  * TST: move flake8 to travis script (#1655)
  * CLN: Remove unused funcs (#1638)
  * Implementing .size and groupby size method (#1627) (#1649)
  * Use strides, shape, and offset in memmap tokenize (#1646)
  * Validate scalar metadata is scalar (#1642)
  * Convert readthedocs links for their .org -> .io migration for
    hosted projects (#1639)
  * CLN: little cleanup of dd.categorical (#1635)
  * Signature of Array.transpose matches numpy (#1632)
  * Error nicely when indexing Array with Array (#1629)
  * ENH: add DataFrame.get_xtype_counts (#1634)
  * PEP8: some fixes (#1633)
- changes from version 0.11.1:
  * support uniform index partitions in set_index(sorted) (#1626)
  * Groupby works with multiprocessing (#1625)
  * Use a nonempty index in _maybe_partial_time_string
  * Fix segfault in groupby-var
  * Support Pandas 0.19.0
  * Deprecations (#1624)
  * work-around for ddf.info() failing because of
    https://github.com/pydata/pandas/issues/14368 (#1623)
  * .str accessor needs to pass thru both args & kwargs (#1621)
  * Ensure dtype is provided in additional tests (#1620)
  * coerce rounded numbers to int in dask.array.ghost (#1618)
  * Use assert_eq everywhere in dask.array tests (#1617)
  * Update documentation (#1606)
  * Support new_axes= keyword in atop (#1612)
  * pass through node_attr and edge_attr in dot_graph (#1614)
  * Add swapaxes to dask array (#1611)
  * add clip to Array (#1610)
  * Add atop(concatenate=False) keyword argument (#1609)
  * Better error message on metadata inference failure (#1598)
  * ENH/API: Enhanced Categorical Accessor (#1574)
  * PEP8: dataframe fix except E127,E402,E501,E731 (#1601)
  * ENH: dd.get_dummies for categorical Series (#1602)
  * PEP8: some fixes (#1605)
  * Fix da.learn tests for scikit-learn release (#1597)
  * Suppress warnings in psutil (#1589)
  * avoid more timeseries warnings (#1586)
  * Support inplace operators in dataframe (#1585)
  * Squash warnings in resample (#1583)
  * expand imports for dask.distributed (#1580)
  * Add indicator keyword to dd.merge (#1575)
  * Error loudly if `nrows` used in read_csv (#1576)
  * Add versioneer (#1569)
  * Strengthen statement about gitter for developers in docs
  * Raise IndexError on out of bounds slice. (#1579)
  * ENH: Support Series in read_hdf (#1577)
  * COMPAT/API: DataFrame.categorize missing values (#1578)
  * Add `pipe` method to dask.dataframe (#1567)
  * Sample from `read_bytes` ends on a delimiter (#1571)
  * Remove mention of bag join in docs (#1568)
  * Tokenize mmap works without filename (#1570)
  * String accessor works with indexes (#1561)
  * corrected links to documentation from Examples (#1557)
  * Use conda-forge channel in travis (#1559)
  * add s3fs to travis.yml (#1558)
  * ENH: DataFrame.select_dtypes (#1556)
  * Improve slicing performance (#1539)
  * Check meta in `__init__` of _Frame
  * Fix metadata in Series.getitem
  * A few changes to `dask.delayed` (#1542)
  * Fixed read_hdf example (#1544)
  * add section on distributed computing with link to toc
  * Fix spelling (#1535)
  * Only fuse simple indexing with getarray backends (#1529)
  * Deemphasize graphs in docs (#1531)
  * Avoid pickle when tokenizing __main__ functions (#1527)
  * Add changelog doc going up to dask 0.6.1 (2015-07-23). (#1526)
  * update dataframe docs
  * update index
  * Update to highlight the use of glob based file naming option for
    df exports (#1525)
  * Add custom docstring to dd.to_csv, mentioning that one file per
    partition is written (#1524)
  * Run slow tests in Travis for all Python versions, even if coverage
    check is disabled. (#1523)
  * Unify example doc pages into one (#1520)
  * Remove lambda/inner functions in dask.dataframe (#1516)
  * Add documentation for dataframe metadata (#1514)
  * "dd.map_partitions" works with scalar outputs (#1515)
  * meta_nonempty returns types of correct size (#1513)
  * add memory use note to tsqr docstring
  * Fix slow consistent keyname test (#1510)
  * Chunks check (#1504)
  * Fix last 'line' in sample; prevents open quotes. (#1495)
  * Create new threadpool when operating from thread (#1487)
  * Add finalize- prefix to dask.delayed collections
  * Move key-split from distributed to dask
  * State that delayed values should be lists in bag.from_delayed
    (#1490)
  * Use lists in db.from_sequence (#1491)
  * Implement user defined aggregations (#1483)
  * Field access works with non-scalar fields (#1484)
- Update to 0.11.0
  * DataFrames now enforce knowing full metadata (columns, dtypes)
    everywhere. Previously we would operate in an ambiguous state
    when functions lost dtype information (such as apply). Now all
    dataframes always know their dtypes and raise errors asking for
    information if they are unable to infer (which they usually
    can). Some internal attributes like _pd and _pd_nonempty have
    been moved.
  * The internals of the distributed scheduler have been refactored
    to transition tasks between explicit states. This improves
    resilience, reasoning about scheduling, plugin operation, and
    logging. It also makes the scheduler code easier to understand
    for newcomers.
  * Breaking Changes
    + The distributed.s3 and distributed.hdfs namespaces are gone.
      Use protocols in normal methods like read_text('s3://...'
      instead.
    + Dask.array.reshape now errs in some cases where previously
      it would have create a very large number of tasks
- update to version 0.10.2:
  * raise informative error on merge(on=frame)
  * Fix crash with -OO Python command line (#1388)
  * [WIP] Read hdf partitioned (#1407)
  * Add dask.array.digitize. (#1409)
  * Adding documentation to create dask DataFrame from HDF5 (#1405)
  * Unify shuffle algorithms (#1404)
  * dd.read_hdf: clear errors on exceeding row numbers (#1406)
  * Rename `get_division` to `get_partition`
  * Add nice error messages on import failures
  * Use task-based shuffle in hash_joins (#1383)
  * Fixed #1381: Reimplemented DataFrame.repartition(npartition=N) so
    it doesn't require indexing and just coalesce existing partitions
    without shuffling/balancing (#1396)
  * Import visualize from dask.diagnostics in docs
  * Backport `equal_nans` to older version of numpy
  * Improve checks for dtype and shape in dask.array
  * Progess bar process should be deamon
  * LZMA may not be available in python 3 (#1391)
  * dd.to_hdf: multiple files multiprocessing avoid locks (#1384)
  * dir works with numeric column names
  * Dataframe groupby works with numeric column names
  * Use fsync when appending to partd
  * Fix pickling issue in dataframe to_bag
  * Add documentation for dask.dataframe.to_hdf
  * Fixed a copy-paste typo in DataFrame.map_partitions docstring
  * Fix 'visualize' import location in diagnostics documentation
    (#1376)
  * update cheat sheet (#1371)
- update to version 0.10.1:
  * `inline` no longer removes keys (#1356)
  * avoid c: in infer_storage_options (#1369)
  * Protect reductions against empty partitions (#1361)
  * Add doc examples for dask.array.histogram. (#1363)
  * Fix typo in pip install requirements path (#1364)
  * avoid unnecessary dependencies between save tasks in
    dataframe.to_hdf (#1293)
  * remove xfail mark for blosc missing const
  * Add `anon=True` for read from s3 test
  * `subs` doesn't needlessly compare keys and values
  * Use pytest.importorskip instead of try/except/return pattern
  * Fixes for bokeh 0.12.0
  * Multiprocess scheduler handles unpickling errors
  * arra.random with array-like parameters (#1327)
  * Fixes issue #1337 (#1338)
  * Remove dask runtime dependence on mock 2.7 backport.
  * Load known but external protocols automatically (#1325)
  * Add center argument to Series/DataFrame.rolling (#1280)
  * Add Bag.random_sample method. (#1332)
  * Correct docs install command and add missing required packages
    (#1333)
  * Mark the 4 slowest tests as slow to get a faster suite by
    default. (#1334)
  * Travis: Install mock package in Python 2.7.
  * Automatic blocksize for read_csv based on available memory and
    number of cores.
  * Replace "Matthew Rocklin" with "Dask Development Team" (#1329)
  * Support column assignment in DataFrame (#1322)
  * Few travis fixes, pandas version >= 0.18.0 (#1314)
  * Don't run hdf test if pytables package is not present. (#1323)
  * Add delayed.compute to api docs.
  * Support datetimes in DataFrame._build_pd (#1319)
  * Test setting the index with datetime with timezones, which is a
    pandas-defined dtype
  * (#1315)
  * Add s3fs to requirements (#1316)
  * Pass dtype information through in Series.astype (#1320)
  * Add draft of development guidelines (#1305)
  * Skip tests needing optional package when it's not present. (#1318)
  * DOC: Document DataFrame.categorize
  * make dd.to_csv support writing to multiple csv files (#1303)
  * quantiles for repartitioning (#1261)
  * DOC: Minimal doc for get_sync (#1312)
  * Pass through storage_options in db.read_text (#1304)
  * Fixes #1237: correctly propagate storage_options through read_*
    APIs and use urlsplit to automatically get remote connection
    settings (#1269)
  * TST: Travis build matrix to specify numpy/pandas ver (#1300)
  * amend doc string to Bag.to_textfiles
  * Return dask.Delayed when saving files with compute = false (#1286)
  * Support empty or small dataframes in from_pandas (#1290)
  * Add validation and tests for order breaking name_function (#1275)
  * ENH: dataframe now supports partial string selection (#1278)
  * Fix typo in spark-dask docs
  * added note and verbose exception about CSV parsing errors (#1287)
- update to version 0.10.0:
  * Add parametrization to merge tests
  * Add more challenging types to nonempty_sample_df test
  * Windows fixes
  * TST: Fix coveralls badge (#1276)
  * Sort index on shuffle (#1274)
  * Update specification docs to reflect new spec.
  * Add groupby docs (#1273)
  * Update spark docs
  * Rolling class receives normal arguments (unchecked other than
    pandas call), stores at
  * Reduce communication in rolling operations #1242 (#1270)
  * Fix Shuffle (#1255)
  * Work on earlier versions of Pandas
  * Handle additional Pandas types
  * Use non-empty fake dataframe in merge operations
  * Add failing test for merge case
  * Add utility function to create sample dataframe
  * update release procedure
  * amend doc string to Bag.to_textfiles (#1258)
  * Drop Python 2.6 support (#1264)
  * Clean DataFrame naming conventions (#1263)
  * Fix some bugs in the rolling implementation.
  * Fix core.get to use new spec
  * Make graph definition recursive
  * Handle empty partitions in dask.bag.to_textfiles
  * test index.min/max
  * Add regression test for non-ndarray slicing
  * Standardize dataframe keynames
  * bump csv sample size to 256k (#1253)
  * Switch tests to utils.tmpdir (#1251)
  * Fix dot_graph filename split bug
  * Correct documentation to reflect argument existing now.
  * Allow non-zero axis for .rolling (for application over columns)
  * Fix scheduler behavior for top-level lists
  * Various spelling mistakes in docstrings, comments, exception
    messages, and a filename
  * Fix typo. (#1247)
  * Fix tokenize in dask.delayed
  * Remove unused imports, pep8 fixes
  * Fix bug in slicing optimization
  * Add Task Shuffle (#1186)
  * Add bytes API (#1224)
  * Add dask_key_name to docs, fix bug in methods
  * Allow formatting in dask.dataframe.to_hdf path and key parameters
  * Match pandas' exceptions a bit closer in the rolling API. Also,
    correct computation f
  * Add tests to package (#1231)
  * Document visualize method (#1234)
  * Skip new rolling API's tests if the pandas we have is too old.
  * Improve df_or_series.rolling(...) implementation.
  * Remove `iloc` property on `dask.dataframe`
  * Support for the new pandas rolling API.
  * test delayed names are different under kwargs
  * Add Hussain Sultan to AUTHORS
  * Add `optimize_graph` keyword to multiprocessing get
  * Add `optimize_graph` keyword to `compute`
  * Add dd.info() (#1213)
  * Cleanup base tests
  * Add groupby documentation stub
  * pngmath is deprecated in sphinx 1.4
  * A few docfixes
  * Extract dtype in dd.from_bcolz
  * Throw NotImplementedError if old toolz.accumulate
  * Add isnull and notnull for dataframe
  * Add dask.bag.accumulate
  * Fix categorical partitioning
  * create single lock for glob read_hdf
  * Fix failing from_url doctest
  * Add missing api to bag docs
  * Add Skipper Seabold to AUTHORS.
  * Don't use mutable default argument
  * Fix typo
  * Ensure to_task_dasks always returns a task
  * Fix dir for dataframe objects
  * Infer metadata in dd.from_delayed
  * Fix some closure issues in dask.dataframe
  * Add storage_options keyword to read_csv
  * Define finalize function for dask.dataframe.Scalar
  * py26 compatibility
  * add stacked logos to docs
  * test from-array names
  * rename from_array tasks
  * add atop to array docs
  * Add motivation and example to delayed docs
  * splat out delayed values in compute docs
  * Fix optimize docs
  * add html page with logos
  * add dask logo to documentation images
  * Few pep8 cleanups to dask.dataframe.groupby
  * Groupby aggregate works with list of columns
  * Use different names for input and output in from_array
  * Don't enforce same column names
  * don't write header for first block in csv
  * Add var and std to DataFrame groupby (#1159)
  * Move conda recipe to conda-forge (#1162)
  * Use function names in map_blocks and elemwise (#1163)
  * add hyphen to delayed name (#1161)
  * Avoid shuffles when merging with Pandas objects (#1154)
  * Add DataFrame.eval
  * Ensure future imports
  * Add db.Bag.unzip
  * Guard against shape attributes that are not sequences
  * Add dask.array.multinomial
- update to version 0.9.0:
  * No upstream changelog
- update to version 0.8.2:
  * No upstream changelog
- update to version 0.8.1:
  * No upstream changelog
- update to version 0.8.0:
  * No upstream changelog
- update to version 0.7.5:
  * No upstream changelog
- update to version 0.7.5:
  * No upstream changelog
- update to version 0.7.0:
  * No upstream changelog
- update to version 0.6.1:
  * No upstream changelog
  
- Update to 0.6.0
  * No upstream changelog
- Update to 0.5.0
  * No upstream changelog
- Initial version
Displaying revisions 61 - 77 of 77
openSUSE Build Service is sponsored by