Data Science & Machine Learning

Announcement

For appeals, questions and feedback about Oracle Forums, please email oracle-forums-moderators_us@oracle.com. Technical questions should be asked in the appropriate category. Thank you!

What is the best way to handle categorical variables with many levels using the ore.neural()?

BilalJan 13 2018 — edited Jan 15 2018

Hi All,

I’m trying to fit a neural network model using ore.neural(). There are around 150 input features. Almost thirty-three features are categorical variables such as client, project type, voltage type, tower type, to name a few.

For a neural network, I transformed each categorical variable into n-columns using one-hot encoding where n is the distinct values of that variable. However, there are some categorical variables like tower type that has approximately 1,700 distinct values. If I follow the same strategy, I’m likely to end up with a data frame comprising thousands of columns containing 1s and 0s.

Is this the recommended way of handling categorical variables using the ore.neural() in this use case?

Can I handle categorical variables differently using the ore.neural()?

Is there a way to automate this one-hot encoding transformation using the ore.neural()? Any example code or idea?

One can find the details of one-hot-encoding on the following link: https://machinelearningmastery.com/how-to-one-hot-encode-sequence-data-in-python/

Any help to get me achieve this issue efficiently will be greatly appreciated.

Many Thanks and

Kind Regards,

Bilal

This post has been answered by rtiran on Jan 15 2018

Jump to Answer