Hope you know python. Also I'm assuming you know some numpy and how
neural networks work already. If you dont, go read
Grokking Deep Learning now, and come back in a few weeks. It's my
favorite ML book. No they aren't paying me.
If you already know tensorflow or keras this should feel familiar to you.
Whether you are on linux or windows or mac the easy way to install
pytorch is with pip.
Go here and generate a pip command. If you have an nvidia gpu newer than 2015 or so you can use the cuda version. Otherwise just select None in the CUDA section of the pip command generator.
You might get some pip errors and whatnot, but with some googling and some other pip installs, you should be able to
import torch in your python shell.
a = torch.tensor([1,2,3])
These are just like numpy arrays. Most of the functions numpy arrays
have, the tensors have too. That includes all the fancy numpy
indexing tricks. Some of the functions have different names though.
One key difference is that tensors keep track of what operation was used to create them. That means they always know their own parent tensors and respective derivatives. That feature is what makes pytorch powerful. It is called autograd, and it is enabled by default, but you can turn it off and on for each tensor individually.
a = torch.tensor([1.0,2.0,3.0]) b = torch.tensor([3.0]) c = a * b # c is tensor([3, 6, 9]) err = (c - 5.0) ** 2 err.backward()
This runs back through err to its parent c, and computes the derivatives of the operations. It follows through until it computes the effects both a and b had on c, and therefore on err. If a and b had parents it would follow through and do those. And so on, and so on. I'm hoping you know how to do that manually (Grokking Deep Learning), but anyways autograd is nice.
There are functions and classes for creating and sampling distributions, shrinking your neural network, data loading and transformation, ... honestly it's a lot. I thought i had used a reasonable portion of it, but I think I haven't used the majority of what's available. At the least, don't expect to know all of it right away.
Usually people don't manually update individual tensors like that. They make a network class instead.
import torch import torch.nn as nn import torch.nn.functional as F import torch.optim as optim import numpy as np class ExampleNetwork(torch.nn.Module): def __init__(self, inputsize, hiddenSize, outputSize): super().__init__() self.layer1 = nn.Linear(inputsize, hiddenSize) self.layer2 = nn.Linear(hiddenSize, outputSize) # when you make layers with nn.Linear() or nn.Conv2d() they automagically # get added to the ExampleNetwork's .parameters member variable # # the parameters hold the weights and such def forward(self, x): # this function is called when you put data into the network x = F.relu(self.layer1(x)) # torch.nn.functional (F) has lots of activation functions x = self.layer2(x) # dont use activation functions on the output layer return x # make a network net = ExampleNetwork(inputSize=2, hiddenSize=16, outputSize=1) # send the network to your device # device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu") device = torch.device("cpu") # use this one if you dont have cuda net.to(device) # make some data inputs = torch.tensor([1.0, 2.0]) # x target = torch.tensor(5.0) # y # put the data on the device .__. cumbersome yes inputs.to(device) # data must be on same device as neural network target.to(device) # pick a network stepping function and error function # # Adam slowly reduces the learning rate each time step is called optimizer = optim.Adam(net.parameters(), lr=0.001) # # a loss function... it's common, but technically you dont even need this. # # # you could just subtract the output from the target data lossFunction = torch.nn.MSELoss() # train your network numTrainingRounds = 1000 for i in range(numTrainingRounds): # zero the derivatives !!! ALWAYS DO THIS BEFORE CALLING backward() !!! # # or else you will be using last training round's derivatives... its wrong! net.zero_grad() output = net(inputs) # run data through neural network, to get its output loss = lossFunction(output, target) # compute the error print(loss) # the error number should decrease as it learns # IF IT DOESNT... SOMETHING IS WRONG. AAAAA!!! loss.backward() # compute derivatives throughout network (backpropogation) optimizer.step() # tweak network weights based on derivatives computed during backward()
import torch import torch.nn as nn import torch.nn.functional as F import torch.optim as optim import numpy as np class ExampleNetwork(torch.nn.Module): def __init__(self, inputsize, hiddenSize, outputSize): super().__init__() self.layer1 = nn.Linear(inputsize, hiddenSize) self.layer2 = nn.Linear(hiddenSize, outputSize) def forward(self, x): x = F.relu(self.layer1(x)) x = self.layer2(x) return x net = ExampleNetwork(inputSize=2, hiddenSize=16, outputSize=1) # device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu") device = torch.device("cpu") net.to(device) optimizer = optim.Adam(net.parameters(), lr=0.001) lossFunction = torch.nn.MSELoss() inputs = torch.tensor([1.0, 2.0]).to(device) target = torch.tensor(5.0).to(device) for i in range(1000): net.zero_grad() output = net(inputs) loss = lossFunction(output, target) print(loss) loss.backward() optimizer.step()
The tutorial in the pytorch docs isn't so bad. It is a bit thorough though. There's a lot more rigorous explanations there.
And if you went through this page, played with the code, and tried the
pytorch docs tutorial, and a lot of this stuff still doesn't make sense,
you probably ignored the prerequisites and kept going.
That's good spirit I'm proud of you. It means you are the perfect person to go read the book Grokking Deep Learning. I promise it's worth a few weeks of your time. It changed my life.