Conversion of vector to binary format is incorrect or should always be disallowed
Runs on: "Oracle Database 23ai Enterprise Edition Release 23.0.0.0.0 - Production Version 23.6.0.24.10"
Submitted: SR4-0000139499 to My Oracle Cloud Support
Note: Documentation DOES NOT SPECIFY that conversion to binary format is not allowed
Case A: converting a vector to binary format with unspecified dimension count yields incorrect vectors.
Case B: converting a vector to binary format returns an error when dimension count is specified.
If vectors can be converted to binary format then Case A returns invalid results and the error in Case B is incorrect
If vectors cannot be converted to binary format the Case A should return the same error as Case B
Case A) Convert any vector to a binary vector:
The documentation states that non binary vectors are quantized to binary vectors by converting dimensions > 0 to 1 and dimensions <= to 0… this binary representation is then converted to a uint8 for every 8 dimensions.
All of the following vectors should convert to a binary vector of [198]
Using to_vector with binary dimension format results in a binary vector that is just the first dimension
The case is identical if the vectors are float32 or float64 (see example 2 and 3 in attached script)
Notice that the conversion to binary only functions without error if dimensions are unspecified (set to *)
with base (string_vector) as (
values
('[25,32,-10, 0, -7, 4,24, -5]'),
('[43,13,-55, 0,-33,42,21,-15]'),
('[17, 2,-53,-5,-17,21, 5, -3]'),
('[ 1, 1, 0, 0, 0, 1, 1, 0]')
), base_vector as (
select
string_vector
--same incorrect results if dimension format is int8, float32 or float64
, to_vector(string_vector, 8, int8) as actual_vector
from base
)
select
string_vector
, actual_vector
--, vector_dimension_format(actual_vector) as dimension_format
--no errors when dimension count is *
, to_vector(actual_vector, *, binary) as vector_to_binary
, to_vector('[' || bin_to_num(1,1,0,0,0,1,1,0) || ']', 8, binary) as correct_binary_vector
from base_vector
/
STRING_VECTOR ACTUAL_VECTOR VECTOR_TO_BINARY CORRECT_BINARY_VECTOR
------------------------------ ------------------------------ ---------------- ---------------------
[25,32,-10, 0, -7, 4,24, -5] [25,32,-10,0,-7,4,24,-5] [25] [198]
[43,13,-55, 0,-33,42,21,-15] [43,13,-55,0,-33,42,21,-15] [43] [198]
[17, 2,-53,-5,-17,21, 5, -3] [17,2,-53,-5,-17,21,5,-3] [17] [198]
[ 1, 1, 0, 0, 0, 1, 1, 0] [1,1,0,0,0,1,1,0] [1] [198]
Case B) Exception when dimension count is specified during conversion to binary. This will raise:
SQL Error: ORA-51814: Vector of BINARY format cannot have any operation performed with vector of any other type.
with base (string_vector) as (
values
('[25,32,-10, 0, -7, 4,24, -5]'),
('[43,13,-55, 0,-33,42,21,-15]'),
('[17, 2,-53,-5,-17,21, 5, -3]'),
('[ 1, 1, 0, 0, 0, 1, 1, 0]')
), base_vector as (
select
string_vector
--same error if dimension format is int8, float32 or float64
, to_vector(string_vector, 8, int8) as actual_vector
from base
)
select
string_vector
, actual_vector
, vector_dimension_format(actual_vector) as dimension_format
--fails when dimension count is specified
, to_vector(actual_vector, 8, binary) as vector_to_binary
, to_vector('[' || bin_to_num(1,1,0,0,0,1,1,0) || ']', 8, binary) as correct_binary_vector
from base_vector
/
Since the documentation does not say that we cannot convert vectors to binary, this would seem to be a good approach for converting some vectors to a more compact storage format (and the accuracy is claimed to be 90% of the non binary vector). (If all dimensions were always > 0 the conversion would not be a viable approach).
Has anyone else considered this problem? I think it ends up being a bug either way (incorrect conversion or all conversions should raise exceptions).
Regards,
Anthony Harper
simple_forum_example.sql
test_script_to_binary_vector_issues.sql