Name-indexed data processing tool

Edit Package miller
http://johnkerl.org/miller/doc

Miller (mlr) allows name-indexed data such as CSV and JSON files to be
processed with functions equivalent to sed, awk, cut, join, sort etc. It can
convert between formats, preserves headers when sorting or reversing, and
streams data where possible so its memory requirements stay small. It works
well with pipes and can feed "tail -f".

Refresh
Refresh
Source Files
Filename Size Changed
miller-5.2.2.tar.gz 0005024653 4.79 MB
miller.changes 0000004391 4.29 KB
miller.spec 0000001870 1.83 KB
Revision 3 (latest revision is 26)
Dominique Leuenberger's avatar Dominique Leuenberger (dimstar_suse) accepted request 518550 from Luigi Baldoni's avatar Luigi Baldoni (alois) (revision 3)
- Updated license
- Update to 5.2.2
  * This bugfix release delivers a fix for #147 where a memory
    allocation failed beyond 4GB.
- Update to version 5.2.1
  * Fixes (gh#johnkerl/miller#142) build segfault on non-x86
    architectures
- Update to version 5.2.0
  This release contains mostly feature requests.
  Features:
  * The stats1 verb now lets you use regular expressions to
    specify which field names to compute statistics on, and/or which
    to group by. Full details are here.
  * The min and max DSL functions, and the min/max/percentile
    aggregators for the stats1 and merge-fields verbs, now support
    numeric as well as string field values. (For mixed string/numeric
    fields, numbers compare before strings.) This means in particular
    that order statistics -- min, max, and non-interpolated
    percentiles -- as well as mode, antimode, and count are now
    possible on string-only fields. (Of course, any operations
    requiring arithmetic on values, such as computing sums, averages,
    or interpolated percentiles, yield an error on string-valued
    input.)
  * There is a new DSL function mapexcept which returns a copy of
    the argument with specified key(s), if any, unset. The motivating
    use-case is to split records to multiple filenames depending on
    particular field value, which is omitted from the output: mlr
    --from f.dat put 'tee > "/tmp/data-".$a, mapexcept($*, "a")'
    Likewise, mapselect returns a copy of the argument with only
    specified key(s), if any, set. This resolves #137.
  * A new -u option for count-distinct allows unlashed counts for
    multiple field names. For example, with -f a,b and without -u,
    count-distinct computes counts for distinct pairs of a and b field
    values. With -f a,b and with -u, it computes counts for distinct a
    field values and counts for distinct b field values separately.
  * If you build from source, you can now do ./configure without
    first doing autoreconf -fiv. This resolves #131.
  * The UTF-8 BOM sequence 0xef 0xbb 0xbf is now automatically
    ignored from the start of CSV files. (The same is already done for
    JSON files.) This resolves #138.
  * For put and filter with -S, program literals such as the 6 in
    $x = 6 were being parsed as strings. This is not sensible, since
    the -S option for put and filter is intended to suppress numeric
    conversion of record data, not program literals. To get string 6
    one may use $x = "6".
  Documentation:
  * A new cookbook example shows how to compute differences
    between successive queries, e.g. to find out what changed in
    time-varying data when you run and rerun a SQL query.
  * Another new cookbook example shows how to compute
    interquartile ranges.
  * A third new cookbook example shows how to compute weighted
    means.
  Bugfixes:
  * CRLF line-endings were not being correctly autodetected when
    I/O formats were specified using --c2j et al.
  * Integer division by zero was causing a fatal runtime
    exception, rather than computing inf or nan as in the
    floating-point case.
Comments 0
openSUSE Build Service is sponsored by