Active Machine Learning with Python by Margaux Masson-Forsythe

Active Machine Learning with Python by Margaux Masson-Forsythe

Author:Margaux Masson-Forsythe
Language: eng
Format: epub
Publisher: Packt Publishing Ltd.
Published: 2024-03-27T00:00:00+00:00


Applying uncertainty sampling to improve classification performance

We will choose the most informative images to label next from our dataset – namely, the frames where the model is least confident, a method discussed in Chapter 2, Designing Query Strategy Frameworks.

We first define a function to get the model’s uncertainty scores:

def least_confident_score(predicted_probs): return 1 - predicted_probs[np.argmax(predicted_probs)]

Then, we define our data loader for the unlabeled set. We will use a batch size of 1 as we will loop through all the images to get the uncertainty scores:

unlabeled_loader = DataLoader(full_dataset, batch_size=1)

We collect the confidence scores for our set of unlabeled images:

least_confident_scores = [] for image, label in unlabeled_loader: probs = F.softmax(model(image), dim=1) score = least_confident_score(probs.detach().numpy()[0]) least_confident_scores.append(score) print(least_confident_scores)

This returns the following:

[0.637821763753891, 0.4338147044181824, 0.18698161840438843, 0.6028554439544678, 0.35655343532562256, 0.3845849633216858, 0.4887065887451172, ...]

These values represent the least confidence scores of the model’s predictions. The higher the scores, the less confident the model is. Therefore, next, we want to know the indices of the images where the scores are highest. We decide that we want to select 200 images (queries):

num_queries = 200

Then, we sort by uncertainty:

sorted_uncertainties, indices = torch.sort( torch.tensor(least_confident_scores))

We get the original indices of the most uncertain samples and print the results:

most_uncertain_indices = indices[-num_queries:] print(f"sorted_uncertainties: {sorted_uncertainties} \ nmost_uncertain_indices selected: {most_uncertain_indices}")

This returns the following:

sorted_uncertainties: tensor([0.0000, 0.0000, 0.0000, ..., 0.7419, 0.7460, 0.7928], dtype=torch.float64) most_uncertain_indices selected: tensor([45820, 36802, 15912, 8635, 32207, 11987, 39232, 6099, 18543, 29082, 42403, 21331, 5633, 29284, 29566, 23878, 47522, 17097, 15229, 11468, 18130, 45120, 25245, 19864, 45457, 20434, 34309, 10034, 45285, 25496, 40169, 31792, 22868, 35525, 31238, 24694, 48734, 18419, 45289, 16126, 31668, 45971, 26393, ... 44338, 19687, 18283, 23128, 20556, 26325])

Now we have the indices of the images selected using our active ML least-confident strategy. These are the images that would be sent to our oracles to be labeled and then used to train the model again.

Let’s take a look at five of these selected images:

fig, axs = plt.subplots(1, 5) for i in range(5): image, label = full_dataset[most_uncertain_indices[i]] image = image.squeeze().permute(1, 2, 0) / 2 + 0.5 axs[i].imshow(image) axs[i].axis('off') plt.show()



Download



Copyright Disclaimer:
This site does not store any files on its server. We only index and link to content provided by other sites. Please contact the content providers to delete copyright contents if any and email us, we'll remove relevant links or contents immediately.