Hope you know python. Also I'm assuming you know some numpy and how
neural networks work already. If you dont, go read
Grokking Deep Learning now, and come back in a few weeks. It's my
favorite ML book. No they aren't paying me.

If you already know tensorflow or keras this should feel familiar to
you.

Whether you are on linux or windows or mac the easy way to install
pytorch is with pip.

Go here and
generate a pip command. If you have an nvidia gpu newer than 2015 or so
you can use the cuda version. Otherwise just select None in the CUDA
section of the pip command generator.

You might get some pip errors and whatnot, but with some googling and
some other pip installs, you should be able to
`import torch`

in your python shell.

`a = torch.tensor([1,2,3])`

These are just like numpy arrays. Most of the functions numpy arrays
have, the tensors have too. That includes all the fancy numpy
indexing tricks. Some of the functions have different names though.
**Ex:** reshape()/view().

One key difference is that tensors keep track of what operation was
used to create them. That means they always know their own parent
tensors and respective derivatives. That feature is what makes
pytorch powerful. It is called **autograd**, and it is
enabled by default, but you can turn it off and on for each tensor
individually.

```
a = torch.tensor([1.0,2.0,3.0])
b = torch.tensor([3.0])
c = a * b
# c is tensor([3, 6, 9])
err = (c - 5.0) ** 2
err.backward()
```

This runs back through err to its parent c, and computes the
derivatives of the operations. It follows through until it computes
the effects both a and b had on c, and therefore on err. If a and b
had parents it would follow through and do those. And so on, and so
on. I'm hoping you know how to do that manually (Grokking Deep Learning), but anyways **autograd** is nice.

There are functions and classes for creating and sampling distributions, shrinking your neural network, data loading and transformation, ... honestly it's a lot. I thought i had used a reasonable portion of it, but I think I haven't used the majority of what's available. At the least, don't expect to know all of it right away.

Usually people don't manually update individual tensors like that. They make a network class instead.

```
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
import numpy as np
class ExampleNetwork(torch.nn.Module):
def __init__(self, inputsize, hiddenSize, outputSize):
super().__init__()
self.layer1 = nn.Linear(inputsize, hiddenSize)
self.layer2 = nn.Linear(hiddenSize, outputSize)
# when you make layers with nn.Linear() or nn.Conv2d() they automagically
# get added to the ExampleNetwork's .parameters member variable
# # the parameters hold the weights and such
def forward(self, x): # this function is called when you put data into the network
x = F.relu(self.layer1(x)) # torch.nn.functional (F) has lots of activation functions
x = self.layer2(x) # dont use activation functions on the output layer
return x
# make a network
net = ExampleNetwork(inputSize=2, hiddenSize=16, outputSize=1)
# send the network to your device
# device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
device = torch.device("cpu") # use this one if you dont have cuda
net.to(device)
# make some data
inputs = torch.tensor([1.0, 2.0]) # x
target = torch.tensor(5.0) # y
# put the data on the device .__. cumbersome yes
inputs.to(device) # data must be on same device as neural network
target.to(device)
# pick a network stepping function and error function
# # Adam slowly reduces the learning rate each time step is called
optimizer = optim.Adam(net.parameters(), lr=0.001)
# # a loss function... it's common, but technically you dont even need this.
# # # you could just subtract the output from the target data
lossFunction = torch.nn.MSELoss()
# train your network
numTrainingRounds = 1000
for i in range(numTrainingRounds):
# zero the derivatives !!! ALWAYS DO THIS BEFORE CALLING backward() !!!
# # or else you will be using last training round's derivatives... its wrong!
net.zero_grad()
output = net(inputs) # run data through neural network, to get its output
loss = lossFunction(output, target) # compute the error
print(loss) # the error number should decrease as it learns
# IF IT DOESNT... SOMETHING IS WRONG. AAAAA!!!
loss.backward() # compute derivatives throughout network (backpropogation)
optimizer.step() # tweak network weights based on derivatives computed during backward()
```

```
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
import numpy as np
class ExampleNetwork(torch.nn.Module):
def __init__(self, inputsize, hiddenSize, outputSize):
super().__init__()
self.layer1 = nn.Linear(inputsize, hiddenSize)
self.layer2 = nn.Linear(hiddenSize, outputSize)
def forward(self, x):
x = F.relu(self.layer1(x))
x = self.layer2(x)
return x
net = ExampleNetwork(inputSize=2, hiddenSize=16, outputSize=1)
# device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
device = torch.device("cpu")
net.to(device)
optimizer = optim.Adam(net.parameters(), lr=0.001)
lossFunction = torch.nn.MSELoss()
inputs = torch.tensor([1.0, 2.0]).to(device)
target = torch.tensor(5.0).to(device)
for i in range(1000):
net.zero_grad()
output = net(inputs)
loss = lossFunction(output, target)
print(loss)
loss.backward()
optimizer.step()
```

The tutorial in the pytorch docs isn't so bad. It is a bit thorough though. There's a lot more rigorous explanations there.

And if you went through this page, played with the code, and tried the
pytorch docs tutorial, and a lot of this stuff still doesn't make sense,
you probably ignored the prerequisites and kept going.

That's good spirit I'm proud of you. It means you are the perfect person
to go read the book Grokking Deep Learning. I promise it's worth
a few weeks of your time. It changed my life.