Name-indexed data processing tool
http://johnkerl.org/miller/doc
Miller (mlr) allows name-indexed data such as CSV and JSON files to be
processed with functions equivalent to sed, awk, cut, join, sort etc. It can
convert between formats, preserves headers when sorting or reversing, and
streams data where possible so its memory requirements stay small. It works
well with pipes and can feed "tail -f".
- Developed at utilities
- Sources inherited from project openSUSE:Factory
-
1
derived packages
- Download package
-
Checkout Package
osc -A https://api.opensuse.org checkout openSUSE:Factory:PowerPC/miller && cd $_
- Create Badge
Refresh
Refresh
Source Files
Filename | Size | Changed |
---|---|---|
miller-5.2.2.tar.gz | 0005024653 4.79 MB | |
miller.changes | 0000004391 4.29 KB | |
miller.spec | 0000001870 1.83 KB |
Revision 3 (latest revision is 26)
Dominique Leuenberger (dimstar_suse)
accepted
request 518550
from
Luigi Baldoni (alois)
(revision 3)
- Updated license - Update to 5.2.2 * This bugfix release delivers a fix for #147 where a memory allocation failed beyond 4GB. - Update to version 5.2.1 * Fixes (gh#johnkerl/miller#142) build segfault on non-x86 architectures - Update to version 5.2.0 This release contains mostly feature requests. Features: * The stats1 verb now lets you use regular expressions to specify which field names to compute statistics on, and/or which to group by. Full details are here. * The min and max DSL functions, and the min/max/percentile aggregators for the stats1 and merge-fields verbs, now support numeric as well as string field values. (For mixed string/numeric fields, numbers compare before strings.) This means in particular that order statistics -- min, max, and non-interpolated percentiles -- as well as mode, antimode, and count are now possible on string-only fields. (Of course, any operations requiring arithmetic on values, such as computing sums, averages, or interpolated percentiles, yield an error on string-valued input.) * There is a new DSL function mapexcept which returns a copy of the argument with specified key(s), if any, unset. The motivating use-case is to split records to multiple filenames depending on particular field value, which is omitted from the output: mlr --from f.dat put 'tee > "/tmp/data-".$a, mapexcept($*, "a")' Likewise, mapselect returns a copy of the argument with only specified key(s), if any, set. This resolves #137. * A new -u option for count-distinct allows unlashed counts for multiple field names. For example, with -f a,b and without -u, count-distinct computes counts for distinct pairs of a and b field values. With -f a,b and with -u, it computes counts for distinct a field values and counts for distinct b field values separately. * If you build from source, you can now do ./configure without first doing autoreconf -fiv. This resolves #131. * The UTF-8 BOM sequence 0xef 0xbb 0xbf is now automatically ignored from the start of CSV files. (The same is already done for JSON files.) This resolves #138. * For put and filter with -S, program literals such as the 6 in $x = 6 were being parsed as strings. This is not sensible, since the -S option for put and filter is intended to suppress numeric conversion of record data, not program literals. To get string 6 one may use $x = "6". Documentation: * A new cookbook example shows how to compute differences between successive queries, e.g. to find out what changed in time-varying data when you run and rerun a SQL query. * Another new cookbook example shows how to compute interquartile ranges. * A third new cookbook example shows how to compute weighted means. Bugfixes: * CRLF line-endings were not being correctly autodetected when I/O formats were specified using --c2j et al. * Integer division by zero was causing a fatal runtime exception, rather than computing inf or nan as in the floating-point case.
Comments 0