IPFS stands for InterPlanetary File System. It's a peer-to-peer distributed system for storing and accessing files, websites, applications, and data.
What's the Big Deal?
IPFS makes downloading content decentralised.
- Supports a resilient internet
If someone attacks Wikipedia's web servers or an engineer at Wikipedia makes a big mistake that causes their servers to catch fire, you can still get the same webpages from somewhere else
- Makes it harder to censor content
Because files on IPFS can come from many places, it's harder for anyone (whether they're states, corporations, or someone else) to block things
- Speeds up the web when you're far away or disconnected
If you can retrieve a file from someone nearby instead of hundreds or thousands of miles away, you can often get it faster
Content Addressing
IPFS retrieves information based on contents, known as content addressing. The content itself is used to form an address, rather than information about the computer and disk location it's stored at.
What does content mean?
A content identifier can point to many different types of data, such as a single small file, a piece of a larger file, or metadata (allows you to access the date, location, or file size of your digital pictures).
So, an individual IPFS address can refer to the metadata of just a single piece of a file, a whole file, a directory, a whole website, or any other kind of content
For example, instead of asking one of Wikipedia's computers for a page, your computer uses IPFS to ask lots of computers around the world to share the page with you. It can get the information for you from anyone who has it, not just Wikipedia.
Traditional URLs & file paths identify a file by where it's located:
https://en.wikipedia.org/wiki/Aardvark
/Users/Alice/Documents/term_paper.doc
C:\Users\Joe\My Documents\project_sprint_presentation.ppt
IPFS addresses a file by its content:
/ipfs/QmXoypizjW3WknFiJnKLwHCnL72vedxjQkDDP1mXWo6uco/wiki/Aardvark.html
That jumble of letters after /ipfs/
is called a content identifier and it’s how IPFS can get content from multiple places. The content identifier is a cryptographic hash of the content at that address. The hash is unique to the content that it came from.
Cryptographic Hash
These are functions that take some arbitrary input and return a fixed-length value. The output returned depends on the given hash algorithm in use.
Example of Hello world
as input:
Hello world
Generated output using SHA-1:
0x7B502C3A1F48C8609AE212CDFB639DEE39673F5E
Generated output using SHA-256:
0x64EC88CA00B268E5BA1A35678A1B5316D212F4F366B2477232534A8AECA37F3C
SHA-256 creates a longer hash because it is a 256-bit hash, whereas SHA-1 creates a 160-bit hash . The prepended 0x
indicates that the hash is represented as a base 16 (or hexadecimal) number.
Why Use Hash?
Cryptographic hashes come with a couple of very important characteristics:
- Deterministic - The same input message always returns exactly the same output hash
- Uncorrelated - A small change in the message should generate a completely different hash
- Unique - Infeasible to generate the same hash from two different messages
- One-way - Infeasible to guess or calculate the input message from its hash
Because of these features, cryptographic hash can identify any piece of data: the hash is unique to the data we calculated it from, and it's not too long (a hash is a fixed length, so the SHA-256 hash of a one-gigabyte video file is still only 32 bytes), so sending it around the network doesn't take up a lot of resources.