These notes are meant for classroom use but also for reading them offline. We will cover the basics of version control with git, as well as the basics of the web stack.

Git

We will be using a version control tool called git to track changes to our code. We’ll also be using GitHub, an online tool for hosting git repositories.

You should already have git installed, if not see the official documentation on how to install git on your operating system.

Why version Control?

  • Keep copies of multiple states of files
    By committing you record a state of the file to which you can go back any time.
  • Create alternative states
    Imagine you just want to try out something, but you realize you have to modify multiple files. You’re not sure whether it will works. With version control you can just create a branch where you can experiment or develop new features without changing the main or other branches.
  • Collaborate in teams
    Nobody wants to send code via e-mail or share via Dropbox. If two people work on a file at the same time it’s not clear how to merge the code. Version control lets you keep your code in a shared central location and has dedicated ways to merge and deal with conflicts.
  • Keep your work safe
    Your hard drive breaks. Your computer is stolen. But your code is safe because you store it not only on your computer but also on a remote server.
  • Share
    You developed something awesome and want to share it. But not only do you want to make it available, you’re also happy about contributions from others!

Version Control with a Central Repository

Types of Version Control: Central Repository

  • A classical client-server model. One server stores the code and all versions.
  • Everybody needs to write to one server.
  • All operations (history, commit, branches) require server connection.
  • The traditional model: CVS, SVN, etc.
  • Pros:
    • Simple
  • Cons:
    • Complex for larger projects
      • Who is allowed to write?
    • Difficult for community projects
      • How do you apply changes that someone outside your team made?

Types of Version Control: Distributed Version Control

Distribute Version Control

  • Everybody has a full history of the repository locally
  • No dedicated server - every node is equal.
    • In practice: often a server is used for one “official” copy of the code. This is a server by convention only, there is no technical difference.
  • Pros:
    • No access issues
      • Make a copy and hack away
      • Ask if partner wants to accept your changes
    • Everything is local
      • Fast!
      • No internet connection required
      • Commit often model (once per feature) - don’t sync all the time.
  • Cons:
    • More complex principle.
    • Extra effort to distinguish between committing and pushing/pulling (synchronizing).

Implementations

  • Centralized
    • CVS
    • SVN
    • Team Foundation Server
  • Distributed
    • git
    • Mercurial
  • We will be using git in this lecture.

git

  • Created by Linus Torvalds, 2005
  • Meaning: British English slang roughly equivalent to “unpleasant person”.
  • git – the stupid content tracker.

I’m an egotistical bastard, and I name all my projects after myself. First ‘Linux’, now ‘git’. – Linus Torvalds

Why git?

git model

Whiteboard sketch of git with a server. A git repository is essentially a large graph. ``

git tutorial

This is a quick intro to git, used in combination with GitHub. This is not a complete tutorial, but will use the most important git features.

We start by configuring git

1
2
$ git config --global user.name "YOUR NAME"
$ git config --global user.email "YOUR EMAIL ADDRESS"

Make sure that his is set to your official school address and your correct name!

Create a folder for your project

1
2
$ mkdir myProject 
$ cd myProject/

Initalize the git repository

1
2
$ git init 
Initialized empty Git repository in ../myProject/.git/

What does git do to your file system?

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
# Let's look at what git creates
$ ls .git/ 
branches  config  description  HEAD  hooks  info  objects  refs

# The interesting stuff is in the config file
$ cat .git/config
[core]
        repositoryformatversion = 0
        filemode = true
        bare = false
        logallrefupdates = true


# More interesting for a project with branches and remotes 
$ cat .git/config 
[core]
       	repositoryformatversion = 0
       	filemode = true
       	bare = false
       	logallrefupdates = true
       	ignorecase = true
       	precomposeunicode = true
[remote "origin"]
       	url = https://github.com/dataviscourse/2016-dataviscourse-website
       	fetch = +refs/heads/*:refs/remotes/origin/*
[branch "master"]
       	remote = origin
       	merge = refs/heads/master

Now let’s create a file

1
2
3
$ echo 'Hello World' > demo.txt
$ cat demo.txt 
Hello World

Let’s add it to version control

1
$ git add demo.txt

Let’s look at what is going on with the repository

1
2
3
4
5
6
7
8
9
$git status
On branch master

No commits yet

Changes to be committed:
  (use "git rm --cached <file>..." to unstage)

	new file:   demo.txt

That means: git knows that it’s supposed to track this file, but it’s not yet versioned.

Let’s commit the file. Once a file is committed, it’s state is recorded and you can go back to previous versions any time.

1
2
3
4
5
6
7
8
9
10
# The -m option specifies the commit message. If you don't use it you'll go into an editor to enter your commit message.  
$ git commit -m "Committing the test file" 
[master (root-commit) 3be5e8c] Wrote to demo
 1 file changed, 1 insertion(+)
 create mode 100644 demo.txt

# Did it work?
$ git status
# On branch master
nothing to commit, working directory clean

That means that now the file is tracked and committed to git. But it’s still only stored on this one computer!

Next, we change a file and commit it again.

1
2
3
4
5
# Now let's change something
$ echo 'Are you still spinning?' >> demo.txt 
$ cat demo.txt 
Hello World!
Are you still spinning?

Let’s check the status of git!

1
2
3
4
5
6
7
8
$ git status
On branch master
Changes not staged for commit:
  (use "git add <file>..." to update what will be committed)
  (use "git checkout -- <file>..." to discard changes in working directory)

      modified:   demo.txt
no changes added to commit (use "git add" and/or "git commit -a")

So git knows that something has changed, but hasn’t recorded it. Let’s commit.

1
2
3
4
$ git commit -m "Added a line to the demo file" 
On branch master
Changes not staged for commit:
	modified:   demo.txt	

That didn’t work! You have to add all the files you want to commit every time. There is a shorthand that you can use to add all the tracked files: append ‘-a’.

1
2
3
$ git commit -a -m "Added a line to the demo file" 
[master b03178f] added a line to the demo file
 1 file changed, 1 insertion(+)

Better. Now, let’s look at what happened up to now

1
2
3
4
5
6
7
8
9
10
11
12
$git log
commit 64e1c31cff02e568cda9ede94fbc8eeeb9e337ee (HEAD -> master)
Author: alexander.lex@gmail.com <alexander.lex@gmail.com>
Date:   Tue Aug 28 10:25:11 2018 -0600

    Added a line to the demo file

commit 3b32255e5b92b65ed59be3bf20bb8a751c149a1e
Author: alexander.lex@gmail.com <alexander.lex@gmail.com>
Date:   Tue Aug 28 10:19:37 2018 -0600

    Commiting the test file

Through this cycle of editing, adding and committing, you can develop software in a linear fashion. Now let’s see how we can create alternate versions.

Branching

Now let’s create a branch

1
$ git branch draft

This created a branch with the name draft. Let’s look at all the other branches

1
2
3
$ git branch
  draft
* master

We have two branches, draft and master. The * tells us the active branch (the HEAD).

The files in your folders are in the state as they are stored in the active branch. When you change the branch the files are changed, removed or added to the state of the target branch.

Let’s switch the branch.

1
2
$ git checkout draft
Switched to branch 'draft'

Let’s see if there is something different

1
2
3
$ cat demo.txt 
Hello World!
Are you still spinning?

No - it’s the same! Now let’s edit.

1
2
3
4
5
$ echo "Spinning round and round" >> demo.txt 
$ cat demo.txt 
Hello World!
Are you still spinning?
Spinning round and round

And commit

1
2
3
$ git commit -a -m "Confirmed, spinning"
[draft 059daaa] Confirmed, spinning
 1 file changed, 1 insertion(+)

We have now written changes to the new branch, draft. The master branch should remain unchanged. Let’s see if that’s true.

1
2
3
4
5
6
$ git checkout master
Switched to branch 'master'

$ cat demo.txt 
Hello World!
Are you still spinning?

The text we added isn’t here, as expected! Next we’re going to change something in the main branch and thus cause a conflict.

1
2
3
4
5
6
7
8
9
10
11
# Writing something to the front and to the end in an editor
$ cat demo.txt 
I am here!
Hello World!
Are you still spinning?
Indeed!

# committing again
$ git commit -a 
[master 8437327] Front and back
 1 file changed, 2 insertions(+)

At this point we have changed the file in two different branches of the repository. This is great for working on new features without breaking a stable codebase, but it can result in conflicts. Let’s try to merge those two branches.

The git merge command merges the specified branch into the currently active one. master is active, and we want to merge draft into master.

1
2
3
4
$ git merge draft
Auto-merging demo.txt
CONFLICT (content): Merge conflict in demo.txt
Automatic merge failed; fix conflicts and then commit the result.

The result

1
2
3
4
5
6
7
8
9
$ cat demo.txt 
I am here!
Hello World!
Are you still spinning?
<<<<<<< HEAD
Indeed!
=======
Spinning round and round
>>>>>>> draft

The first line was merged without problems, The final line, where we have two alternative versions is a conflict. We have to manually resolve the conflict.

Once this is done, we can commit again.

1
2
$ git commit -a -m "resovled conflict"
[master 4dad82f] Merge branch 'draft'

Everything back in order.

1
2
3
$ git status 
# On branch master
nothing to commit, working directory clean

Ignoring specifific files and folders

It’s common that you will have temporary and generated files in your working directory that you don’t want to commit to the repository. For example, IDEs will create folders for cached data, or you might generate JavaScript from TypeScript.

To tell git to not add certain files, file types, or folders you can use a .gitignore file in the root directory of your repository.

Here is an example:

1
2
3
4
5
6
_site/
.sass-cache/
.jekyll-metadata
.idea
Gemfile.lock
*.css.map

These are the basics of git on a local git repository. Now we’ll learn how to sync with other people. This can be done with just git, but we’ll be using GitHub as we’re also using GitHub in the homeworks.

Working with GitHub

First, we’ll create a new repository on github by going to https://github.com/new.

New repo interface on GitHub

Now let’s clone the repository from GitHub.

1
$ git clone https://github.com/alexsb/Demo.git

Let’s see how the config looks for this one.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
$ cat .git/config 
[core]
	repositoryformatversion = 0
	filemode = true
	bare = false
	logallrefupdates = true
	ignorecase = true
	precomposeunicode = true
[remote "origin"]
	url = https://github.com/alexsb/demo.git
	fetch = +refs/heads/*:refs/remotes/origin/*
[branch "master"]
	remote = origin
	merge = refs/heads/master

This creates a local copy of the (empty) GitHub repository. We will just start working with that and commit and push the code to the server. If you’d like to add an existing repository to GitHub, follow these instructions.

1
2
3
4
5
6
7
8
9
10
11
12
13
# What's currently in the repository?
$ ls
LICENSE    README.md
# Write something to demo.txt.
$ echo "Hello world!" > demo.txt
echo "Hello world" > demo.txt
# Add demo.txt to the repository.
$ git add demo.txt
# Commit the file to the repository.
$ git commit -a -m "added demo file" 
[master 2e1918d] added demo file
 1 file changed, 1 insertion(+)
 create mode 100644 demo.txt

Now it’s time to push it to the server!

1
2
3
4
5
6
7
8
$ git push 
Counting objects: 3, done.
Delta compression using up to 8 threads.
Compressing objects: 100% (2/2), done.
Writing objects: 100% (3/3), 324 bytes | 0 bytes/s, done.
Total 3 (delta 0), reused 0 (delta 0)
To https://github.com/alexsb/demo.git
   8e1ecd1..2e1918d  master -> master

We have now committed a file locally and pushed it to the server, i.e., our local copy is in sync with the server copy. Note that the git push command uses the origin defined in the config file. You can also push to other repositories!

Next, we will make changes at another place. We’ll use the GitHub web interface to do that.

Once these changes are done, our local repository is out of sync with the remote repository. To get these changes locally, we have to pull from the repository:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
$ git pull
remote: Counting objects: 3, done.
remote: Compressing objects: 100% (2/2), done.
remote: Total 3 (delta 1), reused 0 (delta 0), pack-reused 0
Unpacking objects: 100% (3/3), done.
From https://github.com/alexsb/demo
   2e1918d..5dd3090  master     -> origin/master
Updating 2e1918d..5dd3090
Fast-forward
 demo.txt | 1 +
 1 file changed, 1 insertion(+)
# Let's see whether the changes are here 
$ cat demo.txt 
Hello world
Are you still spinning?

Other GitHub Features

  • GitHub Issues
    Github Issues are a great way to keep track of open tasks and problems. Issues can be references and closed from commits.
  • Forking
    Forking is essentially making use of the distributed nature of git, while having the benefits of a server. When you fork a repository you make a clone of someone else’s code that you are not allowed to read. The repository appears in your github account and you can start editing the code. If you think you improved the code, you can send a “pull request” to the original owner. The owner can then review your code and merge your modifications into his main repository. Forking is hence virtually the same as branching, with the exception that it resolves issues with write permissions.

GUI Clients

Getting updates to the homeworks

The homeworks are hosted in a git repository. Every time we release a homework we will just update the repository. You can then pull from that repository to get the latest homework on your computer.

To get the homework repository, run the following:

1
2
$ git clone https://github.com/dataviscourse/2018-dataviscourse-homework -o homework
$ cd 2018-dataviscourse-homework

Note that by using the -o homework option we’re not using the default remote origin but a user-defined remote called homework.

Next, create a new repository on the Github.

Ensure your new repository is private and don’t click the option to “Initialize the repository with a README”.

Run the two commands described on GitHub under the heading “Push an existing repository from the command line”. For my repository these are:

1
2
3
4
# adding your own repository as a remote 'origin'
$ git remote add origin https://github.com/alexsb/dataviscourse-hw.git
# pushing the changes retreived from the central HW repo to our own repository
$ git push -u origin master

Now your homework repository is all set!

Committing

While working on homework assignments, periodically run the following:

1
2
$ git commit -a -m "Describe your changes"
$ git push

Remember, the git commit operation takes a snapshot of your code at that point in time but doesn’t write to the server.

The git push operation pushes your local commits to the remote repository.

You should do this frequently: as often as you have an incremental, standalone improvement.

Getting new homework assignments

When we release a new assignment we will simply add it to the homework github repository.

To get the latest homework assignments and potential updates or corrections to the assignment, run the following.

1
$ git pull homework master

Make sure to have all your changes committed before you do that.