Wegfawefgawefg's Quickie's

Pytorch

Get Going Within ... 30 Minutes To A Few Hours

Prerequisites

Hope you know python. Also I'm assuming you know some numpy and how neural networks work already. If you dont, go read Grokking Deep Learning now, and come back in a few weeks. It's my favorite ML book. No they aren't paying me.
If you already know tensorflow or keras this should feel familiar to you.

Installation

Whether you are on linux or windows or mac the easy way to install pytorch is with pip.
Go here and generate a pip command. If you have an nvidia gpu newer than 2015 or so you can use the cuda version. Otherwise just select None in the CUDA section of the pip command generator.
You might get some pip errors and whatnot, but with some googling and some other pip installs, you should be able to import torch in your python shell.

Hand Holding

Tensors
a = torch.tensor([1,2,3])

These are just like numpy arrays. Most of the functions numpy arrays have, the tensors have too. That includes all the fancy numpy indexing tricks. Some of the functions have different names though. Ex: reshape()/view().
One key difference is that tensors keep track of what operation was used to create them. That means they always know their own parent tensors and respective derivatives. That feature is what makes pytorch powerful. It is called autograd, and it is enabled by default, but you can turn it off and on for each tensor individually.

Backprop
a = torch.tensor([1.0,2.0,3.0])
b = torch.tensor([3.0])
c = a * b
#   c is tensor([3, 6, 9])
err = (c - 5.0) ** 2
err.backward()

This runs back through err to its parent c, and computes the derivatives of the operations. It follows through until it computes the effects both a and b had on c, and therefore on err. If a and b had parents it would follow through and do those. And so on, and so on. I'm hoping you know how to do that manually (Grokking Deep Learning), but anyways autograd is nice.

Brain Overload

There are functions and classes for creating and sampling distributions, shrinking your neural network, data loading and transformation, ... honestly it's a lot. I thought i had used a reasonable portion of it, but I think I haven't used the majority of what's available. At the least, don't expect to know all of it right away.

Training A Network

Usually people don't manually update individual tensors like that. They make a network class instead.

import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim

import numpy as np

class ExampleNetwork(torch.nn.Module):
    def __init__(self, inputsize, hiddenSize, outputSize):
        super().__init__()
        self.layer1 = nn.Linear(inputsize, hiddenSize)
        self.layer2 = nn.Linear(hiddenSize, outputSize)
        #   when you make layers with nn.Linear() or nn.Conv2d() they automagically 
        #   get added to the ExampleNetwork's .parameters member variable
        #   #   the parameters hold the weights and such

    def forward(self, x): # this function is called when you put data into the network
        x = F.relu(self.layer1(x))  #   torch.nn.functional (F) has lots of activation functions
        x = self.layer2(x)          #   dont use activation functions on the output layer
        return x

#   make a network
net = ExampleNetwork(inputSize=2, hiddenSize=16, outputSize=1)

#   send the network to your device
# device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
device = torch.device("cpu")  #   use this one if you dont have cuda
net.to(device)

#   make some data
inputs = torch.tensor([1.0, 2.0])   #   x
target = torch.tensor(5.0)          #   y

#   put the data on the device .__. cumbersome yes
inputs.to(device)   #   data must be on same device as neural network
target.to(device)

#   pick a network stepping function and error function
#   #   Adam slowly reduces the learning rate each time step is called
optimizer = optim.Adam(net.parameters(), lr=0.001)  
#   #   a loss function... it's common, but technically you dont even need this. 
#   #   #   you could just subtract the output from the target data
lossFunction = torch.nn.MSELoss()   

#   train your network
numTrainingRounds = 1000
for i in range(numTrainingRounds):
    #   zero the derivatives !!! ALWAYS DO THIS BEFORE CALLING backward() !!!
    #   #   or else you will be using last training round's derivatives... its wrong!
    net.zero_grad() 

    output = net(inputs)    #   run data through neural network, to get its output
    loss = lossFunction(output, target) #   compute the error
    print(loss) #   the error number should decrease as it learns
                #   IF IT DOESNT... SOMETHING IS WRONG. AAAAA!!!

    loss.backward() #   compute derivatives throughout network (backpropogation)
    optimizer.step()    #   tweak network weights based on derivatives computed during backward()

And Now No Comments

import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim

import numpy as np

class ExampleNetwork(torch.nn.Module):
    def __init__(self, inputsize, hiddenSize, outputSize):
    super().__init__()
    self.layer1 = nn.Linear(inputsize, hiddenSize)
    self.layer2 = nn.Linear(hiddenSize, outputSize)

    def forward(self, x): 
    x = F.relu(self.layer1(x))
    x = self.layer2(x)    
    return x

net = ExampleNetwork(inputSize=2, hiddenSize=16, outputSize=1)
# device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
device = torch.device("cpu")
net.to(device)
optimizer = optim.Adam(net.parameters(), lr=0.001)  
lossFunction = torch.nn.MSELoss()   

inputs = torch.tensor([1.0, 2.0]).to(device)
target = torch.tensor(5.0).to(device)

for i in range(1000):
    net.zero_grad() 

    output = net(inputs)
    loss = lossFunction(output, target)
    print(loss)
    
    loss.backward()
    optimizer.step()

Bootstrapped

The tutorial in the pytorch docs isn't so bad. It is a bit thorough though. There's a lot more rigorous explanations there.

Im Sorry

And if you went through this page, played with the code, and tried the pytorch docs tutorial, and a lot of this stuff still doesn't make sense, you probably ignored the prerequisites and kept going.
That's good spirit I'm proud of you. It means you are the perfect person to go read the book Grokking Deep Learning. I promise it's worth a few weeks of your time. It changed my life.