Skip to Main Content

Oracle Database Free

Announcement

For appeals, questions and feedback about Oracle Forums, please email oracle-forums-moderators_us@oracle.com. Technical questions should be asked in the appropriate category. Thank you!

23ai: Converting Vector to Binary Dimension Format Is Inconsistent or Incorrect

Anthony HarperNov 23 2024 — edited Nov 23 2024

Conversion of vector to binary format is incorrect or should always be disallowed

Runs on: "Oracle Database 23ai Enterprise Edition Release 23.0.0.0.0 - Production Version 23.6.0.24.10"
Submitted: SR4-0000139499 to My Oracle Cloud Support

Note: Documentation DOES NOT SPECIFY that conversion to binary format is not allowed
Case A: converting a vector to binary format with unspecified dimension count yields incorrect vectors.
Case B: converting a vector to binary format returns an error when dimension count is specified.
If vectors can be converted to binary format then Case A returns invalid results and the error in Case B is incorrect
If vectors cannot be converted to binary format the Case A should return the same error as Case B

Case A) Convert any vector to a binary vector:
The documentation states that non binary vectors are quantized to binary vectors by converting dimensions > 0 to 1 and dimensions <= to 0… this binary representation is then converted to a uint8 for every 8 dimensions.
All of the following vectors should convert to a binary vector of [198]
Using to_vector with binary dimension format results in a binary vector that is just the first dimension
The case is identical if the vectors are float32 or float64 (see example 2 and 3 in attached script)
Notice that the conversion to binary only functions without error if dimensions are unspecified (set to *)

with base (string_vector) as (
values 
('[25,32,-10, 0, -7, 4,24, -5]'),
('[43,13,-55, 0,-33,42,21,-15]'),
('[17, 2,-53,-5,-17,21, 5, -3]'),
('[ 1, 1,  0, 0,  0, 1, 1,  0]')
), base_vector as (
select 
    string_vector
    --same incorrect results if dimension format is int8, float32 or float64
    , to_vector(string_vector, 8, int8) as actual_vector
from base
)
select
    string_vector
    , actual_vector
    --, vector_dimension_format(actual_vector) as dimension_format
    --no errors when dimension count is *
    , to_vector(actual_vector, *, binary) as vector_to_binary
    , to_vector('[' || bin_to_num(1,1,0,0,0,1,1,0) || ']', 8, binary) as correct_binary_vector
from base_vector
/


STRING_VECTOR                  ACTUAL_VECTOR                  VECTOR_TO_BINARY CORRECT_BINARY_VECTOR
------------------------------ ------------------------------ ---------------- ---------------------
[25,32,-10, 0, -7, 4,24, -5]   [25,32,-10,0,-7,4,24,-5]       [25]             [198]                
[43,13,-55, 0,-33,42,21,-15]   [43,13,-55,0,-33,42,21,-15]    [43]             [198]                
[17, 2,-53,-5,-17,21, 5, -3]   [17,2,-53,-5,-17,21,5,-3]      [17]             [198]                
[ 1, 1,  0, 0,  0, 1, 1,  0]   [1,1,0,0,0,1,1,0]              [1]              [198]                

Case B) Exception when dimension count is specified during conversion to binary. This will raise:

SQL Error: ORA-51814: Vector of BINARY format cannot have any operation performed with vector of any other type.


with base (string_vector) as (
values 
('[25,32,-10, 0, -7, 4,24, -5]'),
('[43,13,-55, 0,-33,42,21,-15]'),
('[17, 2,-53,-5,-17,21, 5, -3]'),
('[ 1, 1,  0, 0,  0, 1, 1,  0]')
), base_vector as (
select 
    string_vector
    --same error if dimension format is int8, float32 or float64
    , to_vector(string_vector, 8, int8) as actual_vector
from base
)
select
    string_vector
    , actual_vector
    , vector_dimension_format(actual_vector) as dimension_format
    --fails when dimension count is specified
    , to_vector(actual_vector, 8, binary) as vector_to_binary
    , to_vector('[' || bin_to_num(1,1,0,0,0,1,1,0) || ']', 8, binary) as correct_binary_vector
from base_vector
/

Since the documentation does not say that we cannot convert vectors to binary, this would seem to be a good approach for converting some vectors to a more compact storage format (and the accuracy is claimed to be 90% of the non binary vector). (If all dimensions were always > 0 the conversion would not be a viable approach).

Has anyone else considered this problem? I think it ends up being a bug either way (incorrect conversion or all conversions should raise exceptions).

Regards,

Anthony Harper

simple_forum_example.sql

test_script_to_binary_vector_issues.sql

Comments
Post Details
Added on Nov 23 2024
8 comments
574 views