Revolutionizing Industries: Detailed Case Studies of AI and Machine Learning Applications with Code Examples

Sure, here’s a more detailed explanation for each line of code in the provided examples, explaining why each step is necessary:

1. Healthcare: IBM Watson for Oncology (using BERT)

Problem: Oncologists need to process vast amounts of medical research and patient data to provide personalized cancer treatment recommendations.

Solution: IBM Watson for Oncology uses natural language processing (NLP) and machine learning to analyze patient medical records, research papers, clinical trial data, and treatment guidelines.

Example Model: BERT (Bidirectional Encoder Representations from Transformers)

Sample Code: Using BERT for document classification

from transformers import BertTokenizer, BertForSequenceClassification
from import DataLoader, Dataset
import torch
  • Import necessary libraries:
  • BertTokenizer and BertForSequenceClassification are imported from the transformers library to handle tokenization and the classification model.
  • DataLoader and Dataset from are used for handling and batching data.
  • torch is the main PyTorch library.
texts = ["Patient diagnosed with stage II breast cancer. Recommended treatment: chemotherapy.",
         "Patient has a history of diabetes. Monitor glucose levels regularly."]
labels = [1, 0]
  • Sample medical text and labels:
  • texts is a list of sample medical texts.
  • labels is a list of labels indicating whether the text contains relevant treatment information (1 for relevant, 0 for irrelevant).
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
inputs = tokenizer(texts, return_tensors='pt', padding=True, truncation=True, max_length=512)
  • Tokenization:
  • BertTokenizer.from_pretrained('bert-base-uncased') loads a pre-trained BERT tokenizer.
  • tokenizer(texts, return_tensors='pt', padding=True, truncation=True, max_length=512) tokenizes the input texts, pads them to the same length, truncates if necessary, and converts them to PyTorch tensors.
class MedicalDataset(Dataset):
    def __init__(self, inputs, labels):
        self.inputs = inputs
        self.labels = labels

    def __len__(self):
        return len(self.labels)

    def __getitem__(self, idx):
        item = {key: val[idx] for key, val in self.inputs.items()}
        item['labels'] = torch.tensor(self.labels[idx])
        return item
  • Custom dataset class:
  • MedicalDataset inherits from Dataset and handles the custom dataset creation.
  • __init__ initializes the inputs and labels.
  • __len__ returns the length of the dataset.
  • __getitem__ retrieves a single data item at the given index.
dataset = MedicalDataset(inputs, labels)
dataloader = DataLoader(dataset, batch_size=2)
  • Create dataset and dataloader:
  • dataset is an instance of MedicalDataset.
  • dataloader is a DataLoader instance that batches the data, here with a batch size of 2.
model = BertForSequenceClassification.from_pretrained('bert-base-uncased')
  • Load pre-trained model:
  • BertForSequenceClassification.from_pretrained('bert-base-uncased') loads a pre-trained BERT model for sequence classification tasks.
for batch in dataloader:
    outputs = model(**batch)
    loss = outputs.loss
    # Optimizer step would go here
  • Training loop:
  • Iterates over batches of data from the dataloader.
  • model(**batch) passes the batch through the model.
  • outputs.loss retrieves the loss from the model output.
  • loss.backward() computes the gradients for backpropagation.
  • An optimizer step would typically follow to update the model weights (not included here for simplicity).

All together –

from transformers import BertTokenizer, BertForSequenceClassification
from import DataLoader, Dataset
import torch

# Sample medical text
texts = ["Patient diagnosed with stage II breast cancer. Recommended treatment: chemotherapy.",
         "Patient has a history of diabetes. Monitor glucose levels regularly."]

# Labels for the text (1 for relevant treatment information, 0 for irrelevant)
labels = [1, 0]

# Tokenization
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
inputs = tokenizer(texts, return_tensors='pt', padding=True, truncation=True, max_length=512)

# Create a custom dataset
class MedicalDataset(Dataset):
    def __init__(self, inputs, labels):
        self.inputs = inputs
        self.labels = labels

    def __len__(self):
        return len(self.labels)

    def __getitem__(self, idx):
        item = {key: val[idx] for key, val in self.inputs.items()}
        item['labels'] = torch.tensor(self.labels[idx])
        return item

dataset = MedicalDataset(inputs, labels)
dataloader = DataLoader(dataset, batch_size=2)

# Load model
model = BertForSequenceClassification.from_pretrained('bert-base-uncased')

# Training loop (simplified)
for batch in dataloader:
    outputs = model(**batch)
    loss = outputs.loss
    # Optimizer step would go here

2. Finance: JPMorgan Chase’s COiN Platform (using SpaCy)

Problem: Reviewing commercial loan agreements is time-consuming and prone to human error.

Solution: The Contract Intelligence (COiN) platform uses machine learning to analyze and extract critical data from legal documents quickly and accurately.

Example Model: Named Entity Recognition (NER) using SpaCy

Sample Code: Using SpaCy for NER in legal documents

import spacy
  • Import SpaCy library:
  • spacy is a popular library for NLP tasks.
nlp = spacy.load("en_core_web_sm")
  • Load SpaCy model:
  • spacy.load("en_core_web_sm") loads a pre-trained SpaCy model for English, which includes tokenization, part-of-speech tagging, and named entity recognition.
text = "This Loan Agreement is made between ABC Corporation and XYZ Bank on January 1, 2023."
  • Sample legal text:
  • text is a string containing a sample legal document.
doc = nlp(text)
  • Process text:
  • nlp(text) processes the text and creates a SpaCy Doc object that contains linguistic annotations.
for ent in doc.ents:
    print(ent.text, ent.label_)
  • Extract and print entities:
  • Iterates over the named entities in the document.
  • ent.text retrieves the text of the entity.
  • ent.label_ retrieves the label (type) of the entity (e.g., ORGANIZATION, DATE).

All Together –

import spacy

# Load SpaCy model
nlp = spacy.load("en_core_web_sm")

# Sample legal text
text = "This Loan Agreement is made between ABC Corporation and XYZ Bank on January 1, 2023."

# Process text
doc = nlp(text)

# Extract entities
for ent in doc.ents:
    print(ent.text, ent.label_)

# Output:
# ABC Corporation ORG
# XYZ Bank ORG
# January 1, 2023 DATE

3. Retail: Amazon’s Recommendation System (using Surprise)

Problem: Personalizing the shopping experience for millions of users to increase customer satisfaction and sales.

Solution: Amazon uses collaborative filtering and deep learning algorithms to analyze user behavior and preferences, providing personalized product recommendations.

Example Model: Matrix Factorization using Surprise

Sample Code: Building a recommendation system with the Surprise library

from surprise import Dataset, Reader, SVD
from surprise.model_selection import train_test_split
from surprise.accuracy import rmse
  • Import necessary libraries:
  • Dataset, Reader, and SVD from surprise for handling data and the SVD algorithm.
  • train_test_split for splitting the dataset.
  • rmse for evaluating the model.
data = Dataset.load_builtin('ml-100k')
  • Load sample data:
  • Dataset.load_builtin('ml-100k') loads the built-in MovieLens 100k dataset for collaborative filtering tasks.
trainset, testset = train_test_split(data, test_size=0.25)
  • Split data into training and test sets:
  • train_test_split(data, test_size=0.25) splits the dataset into 75% training and 25% test data.
algo = SVD()
  • Initialize SVD algorithm:
  • SVD() creates an instance of the SVD algorithm for collaborative filtering.
  • Train the algorithm:
  • trains the SVD model on the training dataset.
predictions = algo.test(testset)
  • Test the algorithm:
  • algo.test(testset) generates predictions on the test dataset.
  • Calculate RMSE:
  • rmse(predictions) calculates the Root Mean Squared Error (RMSE) to evaluate the model’s accuracy.
user_id = str(196)
item_id = str(302)
predicted_rating = algo.predict(user_id, item_id)
  • Make a prediction for a specific user and item:
  • algo.predict(user_id, item_id) predicts the rating for a given user and item.
  • print(predicted_rating) prints the predicted rating.

All Together –

from surprise import Dataset, Reader, SVD
from surprise.model_selection import train_test_split
from surprise.accuracy import rmse

# Sample data
data = Dataset.load_builtin('ml-100k')

# Split data into training and test sets
trainset, testset = train_test_split(data, test_size=0.25)

# Use SVD algorithm
algo = SVD()

# Train the algorithm

# Test the algorithm
predictions = algo.test(testset)

# Calculate RMSE

# Make a prediction for a specific user and item
user_id = str(196)
item_id = str(302)
predicted_rating = algo.predict(user_id, item_id)

4. Agriculture: John Deere’s Precision Agriculture (using Random Forest)

Problem: Farmers need to optimize crop yields while minimizing the use of resources like water, fertilizers, and pesticides.

Solution: John Deere’s precision agriculture solutions use machine learning and IoT sensors to analyze soil health, weather patterns, and crop performance.

Example Model: Random Forest for crop yield prediction

Sample Code: Using Random Forest for crop yield prediction

import pandas as pd
from sklearn.ensemble import RandomForestRegressor
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error
  • Import necessary libraries:
  • pandas for data manipulation.
  • RandomForestRegressor from sklearn.ensemble for the Random Forest model.
  • train_test_split from sklearn.model_selection for splitting the dataset.
  • mean_squared_error from sklearn.metrics for evaluating the model.

data = {
'soil_moisture': [20, 30, 35, 45, 55],
'temperature': [70, 75, 65, 80, 72],
'fertilizer_usage': [10, 20, 15, 30, 25],
'yield': [200, 300, 250, 400, 350]

df = pd.DataFrame(data)

  • Create DataFrame:
  • pd.DataFrame(data) creates a DataFrame from the dictionary data, which includes features like soil moisture, temperature, fertilizer usage, and the target variable, crop yield.
X = df[['soil_moisture', 'temperature', 'fertilizer_usage']]
y = df['yield']
  • Features and target:
  • X contains the features (soil moisture, temperature, fertilizer usage).
  • y contains the target variable (crop yield).
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
  • Split data:
  • train_test_split(X, y, test_size=0.2, random_state=42) splits the dataset into training (80%) and testing (20%) sets. random_state=42 ensures reproducibility.
model = RandomForestRegressor(n_estimators=100)
  • Initialize model:
  • RandomForestRegressor(n_estimators=100) creates a Random Forest regressor with 100 trees., y_train)
  • Train model:
  •, y_train) trains the Random Forest model on the training dataset.
y_pred = model.predict(X_test)
  • Predict:
  • model.predict(X_test) makes predictions on the test dataset.
mse = mean_squared_error(y_test, y_pred)
print(f'Mean Squared Error: {mse}')
  • Evaluate:
  • mean_squared_error(y_test, y_pred) calculates the Mean Squared Error (MSE) to evaluate the model’s performance.
  • print(f'Mean Squared Error: {mse}) prints the MSE value.

All together –

import pandas as pd
from sklearn.ensemble import RandomForestRegressor
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error

# Sample data
data = {
    'soil_moisture': [20, 30, 35, 45, 55],
    'temperature': [70, 75, 65, 80, 72],
    'fertilizer_usage': [10, 20, 15, 30, 25],
    'yield': [200, 300, 250, 400, 350]

df = pd.DataFrame(data)

# Features and target
X = df[['soil_moisture', 'temperature', 'fertilizer_usage']]
y = df['yield']

# Split data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Model
model = RandomForestRegressor(n_estimators=100)

# Train model, y_train)

# Predict
y_pred = model.predict(X_test)

# Evaluate
mse = mean_squared_error(y_test, y_pred)
print(f'Mean Squared Error: {mse}')

5. Transportation: Waymo’s Self-Driving Cars (using YOLO)

Problem: Developing safe and efficient autonomous vehicles to reduce traffic accidents and improve transportation efficiency.

Solution: Waymo uses deep learning and computer vision to interpret and respond to real-time road conditions, traffic signals, and obstacles.

Example Model: YOLO (You Only Look Once) for object detection

Sample Code: Using YOLO for real-time object detection

import cv2
import numpy as np
  • Import necessary libraries:
  • cv2 is the OpenCV library used for image processing.
  • numpy (imported as np) is used for numerical operations.
net = cv2.dnn.readNet("yolov3.weights", "yolov3.cfg")
  • Load YOLO:
  • cv2.dnn.readNet("yolov3.weights", "yolov3.cfg") loads the YOLO model with the specified weights and configuration files.
layer_names = net.getLayerNames()
output_layers = [layer_names[i[0] - 1] for i in net.getUnconnectedOutLayers()]
  • Get output layers:
  • net.getLayerNames() retrieves the names of all layers in the network.
  • net.getUnconnectedOutLayers() retrieves the indices of the unconnected output layers.
  • output_layers lists the names of the output layers.
img = cv2.imread("test_image.jpg")
height, width, channels = img.shape
  • Load image:
  • cv2.imread("test_image.jpg") loads the image from the specified file.
  • img.shape retrieves the dimensions of the image (height, width, number of color channels).
blob = cv2.dnn.blobFromImage(img, 0.00392, (416, 416), (0, 0, 0), True, crop=False)
outs = net.forward(output_layers)
  • Detecting objects:
  • cv2.dnn.blobFromImage(img, 0.00392, (416, 416), (0, 0, 0), True, crop=False) creates a blob from the image, scaling it by 0.00392 and resizing it to 416x416.
  • net.setInput(blob) sets the blob as the input to the network.
  • net.forward(output_layers) runs forward propagation to compute the output from the network.
for out in outs:
    for detection in out:
        scores = detection[5:]
        class_id = np.argmax(scores)
        confidence = scores[class_id]
        if confidence > 0.5:
            center_x = int(detection[0] * width)
            center_y = int(detection[1] * height)
            w = int(detection[2] * width)
            h = int(detection[3] * height)
            x = int(center_x - w / 2)
            y = int(center_y - h / 2)
            cv2.rectangle(img, (x, y), (x + w, y + h), (0, 255, 0), 2)
            cv2.putText(img, str(class_id), (x, y - 10), cv2.FONT_HERSHEY_PLAIN, 1, (0, 255, 0), 2)
  • Show information on the screen:
  • Loops through each output and each detection.
  • scores = detection[5:] extracts the confidence scores for each class.
  • class_id = np.argmax(scores) gets the class with the highest score.
  • confidence = scores[class_id] gets the confidence of the detected class.
  • If confidence > 0.5, it calculates the bounding box coordinates and draws the rectangle and text on the image.
cv2.imshow("Image", img)
  • Display image:
  • cv2.imshow("Image", img) displays the image in a window.
  • cv2.waitKey(0) waits for a key press to close the window.
  • cv2.destroyAllWindows() closes the window and frees the resources.

All together –

import cv2
import numpy as np

# Load YOLO
net = cv2.dnn.readNet("yolov3.weights", "yolov3.cfg")
layer_names = net.getLayerNames()
output_layers = [layer_names[i[0] - 1] for i in net.getUnconnectedOutLayers()]

# Load image
img = cv2.imread("test_image.jpg")
height, width, channels = img.shape

# Detecting objects
blob = cv2.dnn.blobFromImage(img, 0.00392, (416, 416), (0, 0, 0), True, crop=False)
outs = net.forward(output_layers)

# Show information on the screen
for out in outs:
    for detection in out:
        scores = detection[5:]
        class_id = np.argmax(scores)
        confidence = scores[class_id]
        if confidence > 0.5:
            center_x = int(detection[0] * width)
            center_y = int(detection[1] * height)
            w = int(detection[2] * width)
            h = int(detection[3] * height)
            x = int(center_x - w / 2)
            y = int(center_y - h / 2)
            cv2.rectangle(img, (x, y), (x + w, y + h), (0, 255, 0), 2)
            cv2.putText(img, str(class_id), (x, y - 10), cv2.FONT_HERSHEY_PLAIN, 1, (0, 255, 0), 2)

# Display image
cv2.imshow("Image", img)

These case studies, along with detailed explanations and practical examples, illustrate how AI and machine learning are applied in various industries to solve real-world problems effectively. By providing code snippets and thorough explanations, we hope to make these advanced technologies more accessible and understandable.

Out of Memory Errors in Angular Build

