View Issue Details

IDProjectCategoryView StatusLast Update
0000613filegeneralpublic2017-05-08 21:40
ReporterAlexander Belopolsky 
Assigned ToChristos Zoulas 
PrioritynormalSeverityfeatureReproducibilityalways
Status assignedResolutionopen 
Product Version 
Target VersionFixed in Version 
Summary0000613: Add patterns for kdb+ data files
DescriptionKdb+ is a database system available from kx.com. It uses flat binary files that start with a magic byte 0xFF or 0xFE. Attached file defines patterns necessary to make the "file" command to parse kdb+ data files. For more information, see <https://github.com/enlnt/kdb-magic>.
Steps To Reproducecurl https://codeload.github.com/enlnt/kdb-magic/zip/v1.0 -o kdb-magic-1.0.zip
unzip kdb-magic-1.0.zip
file kdb-magic-1.0/test/data/*
Tagsfeature

Relationships

Activities

Alexander Belopolsky

Alexander Belopolsky

2017-05-04 23:22

reporter  

magic (2,443 bytes)
0	string	\xFF	kdb+ data file version 2
>2      byte    =11 symbol

>1      byte    0x01
>>2      byte    =0x00 list
>>2      byte    =0xff boolean scalar
>>2      byte    =0xfe guid scalar
>>2      byte    =0xfc byte scalar
>>2      byte    =0xfb short scalar
>>2      byte    =0xfa int scalar
>>2      byte    =0xf9 long scalar
>>2      byte    =0xf8 real scalar
>>2      byte    =0xf7 float scalar
>>2      byte    =0xf6 char scalar
>>2      byte    =0xf5 symbol scalar
>>2      byte    =0xf4 timestamp scalar
>>2      byte    =0xf3 month scalar
>>2      byte    =0xf2 date scalar
>>2      byte    =0xf1 datetime scalar
>>2      byte    =0xf0 timespan scalar
>>2      byte    =0xef minute scalar
>>2      byte    =0xee second scalar
>>2      byte    =0xed time scalar

>>2      byte    =98 table
>>2      byte    =99 dict

>>3      byte    =1  sorted
>>3      byte    =2  unique
>>3      byte    =3  partitioned
>>3      byte    =4  grouped

0	string	\xFE	kdb+ data file version 3
>1      byte    0x20
>>2      byte    =1  boolean
>>2      byte    =2  guid
>>2      byte    =4  byte
>>2      byte    =5  short
>>2      byte    =6  int
>>2      byte    =7  long
>>2      byte    =8  real
>>2      byte    =9  float
>>2      byte    =10 char
>>2      byte    =11 symbol
>>2      byte    =12 timestamp
>>2      byte    =13 month
>>2      byte    =14 date
>>2      byte    =15 datetime
>>2      byte    =16 timespan
>>2      byte    =17 minute
>>2      byte    =18 second
>>2      byte    =19 time

>>2      byte    =78  nested boolean
>>2      byte    =79  nested guid
>>2      byte    =81  nested byte
>>2      byte    =82  nested short
>>2      byte    =83  nested int
>>2      byte    =84  nested long
>>2      byte    =85  nested real
>>2      byte    =86  nested float
>>2      byte    =87  nested char
>>2      byte    =89  nested timestamp
>>2      byte    =90  nested month
>>2      byte    =91  nested date
>>2      byte    =92  nested datetime
>>2      byte    =93  nested timespan
>>2      byte    =94  nested minute
>>2      byte    =95  nested second
>>2      byte    =96  nested time

>>3      byte    =1  sorted
>>3      byte    =2  unique
>>3      byte    =3  partitioned
>>3      byte    =4  grouped

>1      byte    >0x20 enum
>>1     string  x     by %s
>>11     byte    =1  sorted
>>11     byte    =2  unique
>>11     byte    =3  partitioned
>>11     byte    =4  grouped

0       string  kxzipped kdb+ data file, compressed
magic (2,443 bytes)
Christos Zoulas

Christos Zoulas

2017-05-08 20:19

manager   ~0001515

This magic is based on 1 byte of information (it is too weak) and will produce spurious matches. Pity that the kdb folks did not choose a longer magic number.
Alexander Belopolsky

Alexander Belopolsky

2017-05-08 20:56

reporter   ~0001517

In most cases 0xFF magic is followed by 0x01 and 0xFE by 0x20. The exception is so-called enum type where 0xFE is followed by a null-terminated ascii string.
What are the specific spurious matches that you are concerned about? I can try to think of a way to tighten up the kdb patterns.
Christos Zoulas

Christos Zoulas

2017-05-08 21:40

manager   ~0001518

0xff01 and 0xfe20 is 16 bits of magic, still weak but acceptable. There is lots of magic with 0xfe or 0xff in the 0 position, MPEG stuff, MSX, SQL:
grep ^0 * | grep -i 0xfe
grep ^0 * | grep -i 0xff

in Magdir.

Issue History

Date Modified Username Field Change
2017-05-04 23:22 Alexander Belopolsky New Issue
2017-05-04 23:22 Alexander Belopolsky File Added: magic
2017-05-04 23:37 Alexander Belopolsky Tag Attached: feature
2017-05-08 20:19 Christos Zoulas Note Added: 0001515
2017-05-08 20:19 Christos Zoulas Assigned To => Christos Zoulas
2017-05-08 20:19 Christos Zoulas Status new => assigned
2017-05-08 20:19 Christos Zoulas Status assigned => feedback
2017-05-08 20:56 Alexander Belopolsky Note Added: 0001517
2017-05-08 20:56 Alexander Belopolsky Status feedback => assigned
2017-05-08 21:40 Christos Zoulas Note Added: 0001518