Automationscribe.com
  • Home
  • AI Scribe
  • AI Tools
  • Artificial Intelligence
  • Contact Us
No Result
View All Result
Automation Scribe
  • Home
  • AI Scribe
  • AI Tools
  • Artificial Intelligence
  • Contact Us
No Result
View All Result
Automationscribe.com
No Result
View All Result

High-quality-Tuning BERT for Textual content Classification | by Shaw Talebi | Oct, 2024

admin by admin
October 17, 2024
in Artificial Intelligence
0
High-quality-Tuning BERT for Textual content Classification | by Shaw Talebi | Oct, 2024
399
SHARES
2.3k
VIEWS
Share on FacebookShare on Twitter


We’ll begin by importing just a few helpful libraries.

from datasets import DatasetDict, Dataset
from transformers import AutoTokenizer, AutoModelForSequenceClassification,
TrainingArguments, Coach
import consider
import numpy as np
from transformers import DataCollatorWithPadding

Subsequent, we’ll load the coaching dataset. It consists of three,000 text-label pairs with a 70–15–15 train-test-validation break up. The info are initially from right here (open database license).

dataset_dict = load_dataset("shawhin/phishing-site-classification")

The Transformer library makes it tremendous simple to load and adapt pre-trained fashions. Right here’s what that appears like for the BERT mannequin.

# outline pre-trained mannequin path
model_path = "google-bert/bert-base-uncased"

# load mannequin tokenizer
tokenizer = AutoTokenizer.from_pretrained(model_path)

# load mannequin with binary classification head
id2label = {0: "Secure", 1: "Not Secure"}
label2id = {"Secure": 0, "Not Secure": 1}
mannequin = AutoModelForSequenceClassification.from_pretrained(model_path,
num_labels=2,
id2label=id2label,
label2id=label2id,)

Once we load a mannequin like this, all of the parameters might be set as trainable by default. Nonetheless, coaching all 110M parameters might be computationally pricey and probably pointless.

As a substitute, we are able to freeze many of the mannequin parameters and solely practice the mannequin’s last layer and classification head.

# freeze all base mannequin parameters
for title, param in mannequin.base_model.named_parameters():
param.requires_grad = False

# unfreeze base mannequin pooling layers
for title, param in mannequin.base_model.named_parameters():
if "pooler" in title:
param.requires_grad = True

Subsequent, we might want to preprocess our knowledge. It will encompass two key operations: tokenizing the URLs (i.e., changing them into integers) and truncating them.

# outline textual content preprocessing
def preprocess_function(examples):
# return tokenized textual content with truncation
return tokenizer(examples["text"], truncation=True)

# preprocess all datasets
tokenized_data = dataset_dict.map(preprocess_function, batched=True)

One other necessary step is making a knowledge collator that can dynamically pad token sequences in a batch throughout coaching in order that they have the identical size. We will do that in a single line of code.

# create knowledge collator
data_collator = DataCollatorWithPadding(tokenizer=tokenizer)

As a last step earlier than coaching, we are able to outline a perform to compute a set of metrics to assist us monitor coaching progress. Right here, we’ll take into account mannequin accuracy and AUC.

# load metrics
accuracy = consider.load("accuracy")
auc_score = consider.load("roc_auc")

def compute_metrics(eval_pred):
# get predictions
predictions, labels = eval_pred

# apply softmax to get possibilities
possibilities = np.exp(predictions) / np.exp(predictions).sum(-1,
keepdims=True)
# use possibilities of the constructive class for ROC AUC
positive_class_probs = possibilities[:, 1]
# compute auc
auc = np.spherical(auc_score.compute(prediction_scores=positive_class_probs,
references=labels)['roc_auc'],3)

# predict most possible class
predicted_classes = np.argmax(predictions, axis=1)
# compute accuracy
acc = np.spherical(accuracy.compute(predictions=predicted_classes,
references=labels)['accuracy'],3)

return {"Accuracy": acc, "AUC": auc}

Now, we’re able to fine-tune our mannequin. We begin by defining hyperparameters and different coaching arguments.

# hyperparameters
lr = 2e-4
batch_size = 8
num_epochs = 10

training_args = TrainingArguments(
output_dir="bert-phishing-classifier_teacher",
learning_rate=lr,
per_device_train_batch_size=batch_size,
per_device_eval_batch_size=batch_size,
num_train_epochs=num_epochs,
logging_strategy="epoch",
eval_strategy="epoch",
save_strategy="epoch",
load_best_model_at_end=True,
)

Then, we go our coaching arguments right into a coach class and practice the mannequin.

coach = Coach(
mannequin=mannequin,
args=training_args,
train_dataset=tokenized_data["train"],
eval_dataset=tokenized_data["test"],
tokenizer=tokenizer,
data_collator=data_collator,
compute_metrics=compute_metrics,
)

coach.practice()

The coaching outcomes are proven beneath. We will see that the coaching and validation loss are monotonically reducing whereas the accuracy and AUC improve with every epoch.

Coaching outcomes. Picture by creator.

As a last take a look at, we are able to consider the efficiency of the mannequin on the impartial validation knowledge, i.e., knowledge not used for coaching or setting hyperparameters.

# apply mannequin to validation dataset
predictions = coach.predict(tokenized_data["validation"])

# Extract the logits and labels from the predictions object
logits = predictions.predictions
labels = predictions.label_ids

# Use your compute_metrics perform
metrics = compute_metrics((logits, labels))
print(metrics)

# >> {'Accuracy': 0.889, 'AUC': 0.946}

Bonus: Though a 110M parameter mannequin is tiny in comparison with fashionable language fashions, we are able to scale back its computational necessities utilizing mannequin compression methods. I cowl tips on how to scale back the reminiscence footprint mannequin by 7X within the article beneath.

High-quality-tuning pre-trained fashions is a strong paradigm for creating higher fashions at a decrease price than coaching them from scratch. Right here, we noticed how to do that with BERT utilizing the Hugging Face Transformers library.

Whereas the instance code was for URL classification, it may be readily tailored to different textual content classification duties.

Extra on LLMs 👇

Shaw Talebi

Giant Language Fashions (LLMs)

Tags: BERTClassificationfinetuningOctShawTalebiText
Previous Post

How DPG Media makes use of Amazon Bedrock and Amazon Transcribe to reinforce video metadata with AI-powered pipelines

Next Post

How SailPoint makes use of Anthropic’s Claude on Amazon Bedrock to robotically generate TypeScript code for SaaS connectors

Next Post
How SailPoint makes use of Anthropic’s Claude on Amazon Bedrock to robotically generate TypeScript code for SaaS connectors

How SailPoint makes use of Anthropic’s Claude on Amazon Bedrock to robotically generate TypeScript code for SaaS connectors

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Popular News

  • How Aviva constructed a scalable, safe, and dependable MLOps platform utilizing Amazon SageMaker

    How Aviva constructed a scalable, safe, and dependable MLOps platform utilizing Amazon SageMaker

    401 shares
    Share 160 Tweet 100
  • Diffusion Mannequin from Scratch in Pytorch | by Nicholas DiSalvo | Jul, 2024

    401 shares
    Share 160 Tweet 100
  • Unlocking Japanese LLMs with AWS Trainium: Innovators Showcase from the AWS LLM Growth Assist Program

    401 shares
    Share 160 Tweet 100
  • Proton launches ‘Privacy-First’ AI Email Assistant to Compete with Google and Microsoft

    400 shares
    Share 160 Tweet 100
  • Streamlit fairly styled dataframes half 1: utilizing the pandas Styler

    400 shares
    Share 160 Tweet 100

About Us

Automation Scribe is your go-to site for easy-to-understand Artificial Intelligence (AI) articles. Discover insights on AI tools, AI Scribe, and more. Stay updated with the latest advancements in AI technology. Dive into the world of automation with simplified explanations and informative content. Visit us today!

Category

  • AI Scribe
  • AI Tools
  • Artificial Intelligence

Recent Posts

  • How Infosys improved accessibility for Occasion Data utilizing Amazon Nova Professional, Amazon Bedrock and Amazon Elemental Media Providers
  • Authorities Funding Graph RAG | In the direction of Information Science
  • Use Amazon Bedrock Clever Immediate Routing for price and latency advantages
  • Home
  • Contact Us
  • Disclaimer
  • Privacy Policy
  • Terms & Conditions

© 2024 automationscribe.com. All rights reserved.

No Result
View All Result
  • Home
  • AI Scribe
  • AI Tools
  • Artificial Intelligence
  • Contact Us

© 2024 automationscribe.com. All rights reserved.