What is IPFS?

What is IPFS?

IPFS stands for InterPlanetary File System. It's a peer-to-peer distributed system for storing and accessing files, websites, applications, and data.


What's the Big Deal?

IPFS makes downloading content decentralised.

  • Supports a resilient internet

If someone attacks Wikipedia's web servers or an engineer at Wikipedia makes a big mistake that causes their servers to catch fire, you can still get the same webpages from somewhere else

  • Makes it harder to censor content

Because files on IPFS can come from many places, it's harder for anyone (whether they're states, corporations, or someone else) to block things

  • Speeds up the web when you're far away or disconnected

If you can retrieve a file from someone nearby instead of hundreds or thousands of miles away, you can often get it faster


Content Addressing

IPFS retrieves information based on contents, known as content addressing. The content itself is used to form an address, rather than information about the computer and disk location it's stored at.


What does content mean?

A content identifier can point to many different types of data, such as a single small file, a piece of a larger file, or metadata (allows you to access the date, location, or file size of your digital pictures).

So, an individual IPFS address can refer to the metadata of just a single piece of a file, a whole file, a directory, a whole website, or any other kind of content


For example, instead of asking one of Wikipedia's computers for a page, your computer uses IPFS to ask lots of computers around the world to share the page with you. It can get the information for you from anyone who has it, not just Wikipedia.

Traditional URLs & file paths identify a file by where it's located:

  • https://en.wikipedia.org/wiki/Aardvark
  • /Users/Alice/Documents/term_paper.doc
  • C:\Users\Joe\My Documents\project_sprint_presentation.ppt


IPFS addresses a file by its content:

/ipfs/QmXoypizjW3WknFiJnKLwHCnL72vedxjQkDDP1mXWo6uco/wiki/Aardvark.html

That jumble of letters after /ipfs/ is called a content identifier and it’s how IPFS can get content from multiple places. The content identifier is a cryptographic hash of the content at that address. The hash is unique to the content that it came from.


Cryptographic Hash

These are functions that take some arbitrary input and return a fixed-length value. The output returned depends on the given hash algorithm in use.


Example of Hello world as input:

Hello world

Generated output using SHA-1:

0x7B502C3A1F48C8609AE212CDFB639DEE39673F5E

Generated output using SHA-256:

0x64EC88CA00B268E5BA1A35678A1B5316D212F4F366B2477232534A8AECA37F3C

SHA-256 creates a longer hash because it is a 256-bit hash, whereas SHA-1 creates a 160-bit hash . The prepended 0x indicates that the hash is represented as a base 16 (or hexadecimal) number.


Why Use Hash?

Cryptographic hashes come with a couple of very important characteristics:

  • Deterministic - The same input message always returns exactly the same output hash
  • Uncorrelated - A small change in the message should generate a completely different hash
  • Unique - Infeasible to generate the same hash from two different messages
  • One-way - Infeasible to guess or calculate the input message from its hash

Because of these features, cryptographic hash can identify any piece of data: the hash is unique to the data we calculated it from, and it's not too long (a hash is a fixed length, so the SHA-256 hash of a one-gigabyte video file is still only 32 bytes), so sending it around the network doesn't take up a lot of resources.



References