Predicting continuous outcomes, such as sales forecasting or real estate pricing.
Categorizing data into predefined groups, like customer churn prediction or disease diagnosis.
Predicting future values based on historical time series data, like stock market trends or demand forecasting.
What is image classification? Basics you need to know
Exploring Object Detection Applications and Benefits
Object Segmentation
Football's video analysis revolution: From the top clubs to the masses
Sentiment Analysis: A Definitive Guide
Text Classification
🇺🇸: I'm a cat.
↓
🇨🇳: 我是一只猫。
🇩🇪: ich bin eine Katze
🇯🇵: 吾輩は猫である。
Named Entity Recognition and Classification with Scikit-Learn
Getting Started With Reinforcement Learning
Getting Started With OpenAI Gym: The Basic Building Blocks
Identifying fraudulent activities in financial transactions.
Detecting malicious activities in network traffic.
Animated Machine Learning Classifiers
Building an End-to-End Logistic Regression Model
Animated Machine Learning Classifiers
Animated Machine Learning Classifiers
Animated Machine Learning Classifiers
File:K-means convergence.gif
Animated Machine Learning Classifiers
Animated Machine Learning Classifiers
If you know that many people take taxi to the airport, you can add the distance to the airport as a feature.
df.describe()
# count mean std min 25% 50% 75% max
# var1 1000.0 12.542000 6.735307 0.0 8.00 12.0 18.0 24.0
# var2 1000.0 0.255000 0.435941 0.0 0.00 0.0 1.0 1.0
df.isnull().sum()
# var1 12
# var2 3
# dtype: int64
df.corr()
# var1 var2
# var1 1.000000 -0.121675
# var2 -0.121675 1.000000
df['Column3'].value_counts()
# True 500
# False 490
# NaN 10
df['Column2'].unique()
# array([5, 6, 7, ..., 53, 54, 55])
df['Column2'].hist()
df.boxplot(column=["Column1", "Column2"])
df.plot.scatter(x='Column1', y='Column2')
sns.heatmap(df.corr(), annot=True)
# Fill missing values
df.fillna(value)
# Drop rows/columns with missing values
df.dropna(axis=0, how='any')
df['column'].astype('dtype')
df[df['column'] < upper_limit]
from sklearn.preprocessing import MinMaxScaler, StandardScaler
scaler = MinMaxScaler() # or StandardScaler()
df_scaled = scaler.fit_transform(df)
# Using get_dummies
pd.get_dummies(df, columns=['categorical_column'])
# Using category codes
df['categorical_column'] = df['categorical_column'].astype('category').cat.codes
df['new_feature'] = df['column1'] / df['column2']
df['year'] = df['datetime_column'].dt.year
from sklearn.decomposition import PCA
pca = PCA(n_components=k)
df_reduced = pca.fit_transform(df)
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(df.drop('target', axis=1), df['target'], test_size=0.2)
from sklearn.model_selection import cross_val_score, KFold
kf = KFold(n_splits=5)
scores = cross_val_score(model, X, y, cv=kf)
from sklearn.model_selection import StratifiedKFold
skf = StratifiedKFold(n_splits=5)
scores = cross_val_score(model, X, y, cv=skf)
from sklearn.metrics import accuracy_score
accuracy = accuracy_score(y_true, y_pred)
from sklearn.metrics import precision_score, recall_score, f1_score
precision = precision_score(y_true, y_pred)
recall = recall_score(y_true, y_pred)
f1 = f1_score(y_true, y_pred)
from sklearn.metrics import roc_auc_score
roc_auc = roc_auc_score(y_true, y_scores)
from sklearn.metrics import mean_squared_error
mse = mean_squared_error(y_true, y_pred)
rmse = mean_squared_error(y_true, y_pred, squared=False)
from sklearn.metrics import mean_absolute_error
mae = mean_absolute_error(y_true, y_pred)
from sklearn.metrics import log_loss
logloss = log_loss(y_true, y_pred_probs)
predictions = (model1.predict(X_test) + model2.predict(X_test) + model3.predict(X_test)) / 3
weights = [0.3, 0.4, 0.3]
predictions = weights[0]*model1.predict(X_test) + weights[1]*model2.predict(X_test) + weights[2]*model3.predict(X_test)
from sklearn.ensemble import BaggingClassifier
bagging_model = BaggingClassifier(base_estimator=DecisionTreeClassifier(), n_estimators=100)
bagging_model.fit(X_train, y_train)
predictions = bagging_model.predict(X_test)
from sklearn.ensemble import GradientBoostingClassifier
boosting_model = GradientBoostingClassifier(n_estimators=100)
boosting_model.fit(X_train, y_train)
predictions = boosting_model.predict(X_test)
from sklearn.ensemble import StackingClassifier
from sklearn.linear_model import LogisticRegression
estimators = [
('rf', RandomForestClassifier(n_estimators=10, random_state=42)),
('svr', make_pipeline(StandardScaler(), LinearSVC(random_state=42)))
]
stacking_model = StackingClassifier(estimators=estimators, final_estimator=LogisticRegression())
stacking_model.fit(X_train, y_train)
predictions = stacking_model.predict(X_test)