Feature Engineering

Feature Engineering

Feature Engineering

What is Feature Engineering?

  • Definition: Feature engineering is the process of using domain knowledge to extract features from raw data.
  • Goal: Improve the performance of machine learning models by creating better input representations.
Feature Engineering

Why is Feature Engineering Important?

  • Enhances Model Performance: Well-engineered features can significantly improve model accuracy.
  • Reduces Complexity: Helps in reducing the dimensionality of the data and eliminating redundant features.
  • Transforms Raw Data: Converts raw data into representations usable by the model.
Feature Engineering

Feature Engineering Techniques

Feature Engineering

Polynomial Features

Creating new features by raising existing features to a power.

Feature Engineering

Interaction Features

Creating new features by combining two or more features.

Feature Engineering

Normalization and Standardization

  • Normalization: Scaling features to a range, typically [0, 1].

    • Formula:
  • Standardization: Scaling features to have zero mean and unit variance.

    • Formula:
Feature Engineering

Log Transformation

  • Definition: Applying logarithmic transformation to reduce skewness.
  • Example:
Feature Engineering

One-Hot Encoding

  • Definition: Converting categorical data into numerical format.
  • Example:
    • Converting animal types (dog, cat, penguin) into binary columns.

Binning

  • Definition: Converting continuous features into categorical ones by dividing the range into intervals.
  • Example:
    • Converting age into age groups: 0-18, 19-35, 36-50, 51+.
Feature Engineering

One-Hot Encoding

  • One-hot-encoding is a form of feature engineering that converts categorical data into a numerical format.
  • It creates a binary column for each category in the original feature. The binary column has a value of 1 if the category is present and 0 otherwise.
  • This lets us use categorical data in machine learning models that require numerical input.
Feature Engineering

One-Hot Encoding Example

data = pd.DataFrame({"animal": ["dog", "cat", "dog", "penguin"]})
pd.get_dummies(data)
   animal_cat  animal_dog  animal_penguin
0           0           1               0
1           1           0               0
2           0           1               0
3           0           0               1
Feature Engineering

Exercise

https://shorturl.at/SUNVD