Oracle Database Free

Announcement

For appeals, questions and feedback about Oracle Forums, please email oracle-forums-moderators_us@oracle.com. Technical questions should be asked in the appropriate category. Thank you!

Interested in getting your voice heard by members of the Developer Marketing team at Oracle? Check out this post for AppDev or this post for AI focus group information.

23ai: Converting Vector to Binary Dimension Format Is Inconsistent or Incorrect

Anthony HarperNov 23 2024 — edited Nov 23 2024

Conversion of vector to binary format is incorrect or should always be disallowed

Runs on: "Oracle Database 23ai Enterprise Edition Release 23.0.0.0.0 - Production Version 23.6.0.24.10"
Submitted: SR4-0000139499 to My Oracle Cloud Support

Note: Documentation DOES NOT SPECIFY that conversion to binary format is not allowed
Case A: converting a vector to binary format with unspecified dimension count yields incorrect vectors.
Case B: converting a vector to binary format returns an error when dimension count is specified.
If vectors can be converted to binary format then Case A returns invalid results and the error in Case B is incorrect
If vectors cannot be converted to binary format the Case A should return the same error as Case B

Case A) Convert any vector to a binary vector:
The documentation states that non binary vectors are quantized to binary vectors by converting dimensions > 0 to 1 and dimensions <= to 0… this binary representation is then converted to a uint8 for every 8 dimensions.
All of the following vectors should convert to a binary vector of [198]
Using to_vector with binary dimension format results in a binary vector that is just the first dimension
The case is identical if the vectors are float32 or float64 (see example 2 and 3 in attached script)
Notice that the conversion to binary only functions without error if dimensions are unspecified (set to *)

with base (string_vector) as (
values 
('[25,32,-10, 0, -7, 4,24, -5]'),
('[43,13,-55, 0,-33,42,21,-15]'),
('[17, 2,-53,-5,-17,21, 5, -3]'),
('[ 1, 1,  0, 0,  0, 1, 1,  0]')
), base_vector as (
select 
    string_vector
    --same incorrect results if dimension format is int8, float32 or float64
    , to_vector(string_vector, 8, int8) as actual_vector
from base
)
select
    string_vector
    , actual_vector
    --, vector_dimension_format(actual_vector) as dimension_format
    --no errors when dimension count is *
    , to_vector(actual_vector, *, binary) as vector_to_binary
    , to_vector('[' || bin_to_num(1,1,0,0,0,1,1,0) || ']', 8, binary) as correct_binary_vector
from base_vector
/


STRING_VECTOR                  ACTUAL_VECTOR                  VECTOR_TO_BINARY CORRECT_BINARY_VECTOR
------------------------------ ------------------------------ ---------------- ---------------------
[25,32,-10, 0, -7, 4,24, -5]   [25,32,-10,0,-7,4,24,-5]       [25]             [198]                
[43,13,-55, 0,-33,42,21,-15]   [43,13,-55,0,-33,42,21,-15]    [43]             [198]                
[17, 2,-53,-5,-17,21, 5, -3]   [17,2,-53,-5,-17,21,5,-3]      [17]             [198]                
[ 1, 1,  0, 0,  0, 1, 1,  0]   [1,1,0,0,0,1,1,0]              [1]              [198]

Case B) Exception when dimension count is specified during conversion to binary. This will raise:

SQL Error: ORA-51814: Vector of BINARY format cannot have any operation performed with vector of any other type.


with base (string_vector) as (
values 
('[25,32,-10, 0, -7, 4,24, -5]'),
('[43,13,-55, 0,-33,42,21,-15]'),
('[17, 2,-53,-5,-17,21, 5, -3]'),
('[ 1, 1,  0, 0,  0, 1, 1,  0]')
), base_vector as (
select 
    string_vector
    --same error if dimension format is int8, float32 or float64
    , to_vector(string_vector, 8, int8) as actual_vector
from base
)
select
    string_vector
    , actual_vector
    , vector_dimension_format(actual_vector) as dimension_format
    --fails when dimension count is specified
    , to_vector(actual_vector, 8, binary) as vector_to_binary
    , to_vector('[' || bin_to_num(1,1,0,0,0,1,1,0) || ']', 8, binary) as correct_binary_vector
from base_vector
/

Since the documentation does not say that we cannot convert vectors to binary, this would seem to be a good approach for converting some vectors to a more compact storage format (and the accuracy is claimed to be 90% of the non binary vector). (If all dimensions were always > 0 the conversion would not be a viable approach).

Has anyone else considered this problem? I think it ends up being a bug either way (incorrect conversion or all conversions should raise exceptions).

Regards,

Anthony Harper

simple_forum_example.sql

test_script_to_binary_vector_issues.sql

Added on Nov 23 2024

#23ai, #aivectorsearch, #binaryvector, #vectorformatconversion

8 comments

310 views

3 mentions

Oracle Database Free

23ai: Converting Vector to Binary Dimension Format Is Inconsistent or Incorrect

Comments

Post Details