diff --git a/chap5.md b/chap5.md new file mode 100644 index 0000000..41bf505 --- /dev/null +++ b/chap5.md @@ -0,0 +1,246 @@ +# Chapter 5: Amazon Simple Storage Service (S3) + +## Introduction to storage options on AWS + +Different use cases require different storage architecture for performance, reliability, durability. + +3 primary storage options: +* Block storage +* File storage +* Object storage + +### Block storage + +**Block storage** = an storage architecture design that enable store data onto media (e.g. hard disk) in fixed size chunks (i.e. each block has same size) +* Working principle: data is broken into small blocks and placed on the media, each block has a corresponding unique address assigned. These addresses are saved in metadata +* Data Retrieval: management software identify the blocks to retrieve, reassemble data into file to be represented. +* AWS service to offer block storage: + * **Elastic Block Store (EBS)**, can be configured as volumne attached to EC2 +* Characterstic of block storage: ultra-low latency required for high-performance workload +* EBS are not directly attached to EC2, but mounted as storage as EBS volume via high-speed network links +* Use cases: + * system files used by ES2 OS + * used as **DAS (direct-attached storage)** or **storage area network (SANs)** + +### File storage + +**File storage** = architecture offers a centralized location for corporate data +* Offer *hierarchical structure* to store data (i.e. in folders and sub-folders) +* Data Retrieval: need to know file location and folder structure +* Use cases: + * corporate file sharing +* AWS service: + * **Elastic File System (EFS)** = managed elastic filesystem designed to share file data without provisioning or managing storage. + * Filesystem can grow or shrink based on need + * Support mount EFS on Linux EC2 instances (not Windows) + * **FSx for Lustre** = High-performance filesystem desgined for GB throughput and millions of IOPS (input/output operations per seconds) + * Support Linux-based EC2 instances + * **FSx for Windows File Server** = designed for Microsoft Windows EC2 instances. +* Con: folder structure sometime reduce performance of R/W + +### Object storage + +**Object storage** = storing complete files as individual objects. +* **Unstructured data**: Object storage is a *flat file structure*. Objects are stored without folder or file-level hierarchy. +* Object storage metadata (e.g. name, etc) and other attributes are used to create unique identifier to locate the object +* Benefit of using object storage: + * metadata: rich information within metadata enable easy data analytics than other storage architectures + * high performance, durability and scalability. As it does not have file structure. +* AWS service: + * **S3**: AWS let you create container called **bucket**, to store objects +* Use cases: + * Store digital assets for web/app (e.g. document, images, video) + +## Introduction to Amazon S3 + +Amazon S3 offer up to 99.99999999999% durability (eleven 9s) + +### Buckets and objects + +* *Step before upload data to Amazon S3*: create a bucket (i.e. container) +* A bucket need a *unique global namespace* + * unique across AWS ecosystem + * Hard to find common names. Usually combined with corpo name +* Bucket content can be directly accessed over public internet (after configuration) via two ways: + * **Virtual hosted-styled endpoints**: + * S3 **Uniform Resource Locator (URL)** consists of 2 parts + * part 1: **Domain Name System (DNS)** subdomain name, which contains bucket name + * part 2: object name + * e.g. `https://just-desserts.s3.amazonaws.com/blueberry-muffin.txt` where `just-desserts.s3.amazonaws.com` is DNS subdomain name + * It need DNS config + * **Website endpoint**: bucket is configured with a static web hosting service, and available at AWS Region-specific endpoint: + * format 1: **s3-website** dash (-) Region + * `http://.s3-website-Region.amazonaws.com` + * format 2: **s3-website** dot (.) Region + * `http://.s3-website.Region.amazonaws.com` +* You cannot have nested bucket (i.e. no bucket within bucket) + * Each bucket has one URI + +Object storage: +* An object is stored in entirety, instead of broken into chunks like block storage +* Maximum size of one object (5 TB) + + +Key & Value: +* **Key** of object = filename of the object +* **Value** = data contained in object + +### Managing your objects in a bucket + +S3 bucket is flat filesystem. But object name can mimick file structure using **prefix** & **delimiter (/)**. It still has ONE SINGLE object name. + +e.g. Amazon S3 prefix and delimiter + +![Figure 5.1 - Amazon S3 prefixes and delimiter example](imgs/B17124_05_01.jpg) + +Where: +* **packt-marketing** is bucket name +* **campain** & **cloud-practitioner** are sub-folder name + +Benefit of prefix: +* Help to limit search results +* Enhance access performance + +### Regional hosting - global availability + +* After configuration, buckets & objects are globally accessible. +* But S3 buckets are stored in Regions, which impact latency and cost for accessing resources +* AWS does not replicate data outside the chosen Region + +### Access permissions + +S3 has 2 primary methods of granting access to resources: +* **bucket policy** +* **access-control lists (ACLs)**: legacy + +#### Bucket policies + +* Bucket policy is applied to entire bucket and objects within it +* policy document written in JSON +* Bucket policies grant cross-account access (i.e. user or app in other AWS account) by specifying AWS ID or IAM user **Amazon Resource Name (ARN)** from the other AWS account + +How to define policy: similar to IAM policy, you need define attributes within `Statement` +* `"Action"`: action to be taken on the bucket (e.g. `s3:GetObject`) +* `"Effect"`: Allow or Deny +* `"Resource"`: Bucket ARN, wildcard `(*)` can apply +* `"Principle"`: if given `*`, it means available for anyone +* `"Condition"`: conditional policy to be added on bucket, to create filter + +e.g. of typical bucket policy granting anonymous access to contained objects + +![Figure 5.2 - Bucket policy example granting anonymous access to the contents of the 'packt-marketing' S3 bucket](imgs/B17124_05_02.jpg) + +e.g. bucket policy of conditional statement + +![Figure 5.3 - Bucket policy defined with a conditional statement](imgs/B17124_05_03.jpg) + +#### IAM policies + +Other than bucket policies and ACLs (outdated), you can create IAM policies to be assigned to IAM identities (e.g. IAM users, groups, or roles). + +Problem of IAM policies: +* IAM policies cannot be attached directly to resources. +* IAM policies cannot be attached to IAM user from ANOTHER AWS account. +* IAM policies cannot be used to grant anonymous access + +Solution: +1. Create an IAM role with necessary permissions +2. Enable trust policy for other AWS account's IAM users to assume this role + +#### Choosing the right S3 storage class + +S3 charges are based on 6 cost components: +1. Storage (amount of storage, duration, storage class) +2. Requests and data retrievals +3. Data transfer +4. User of transfer acceleration +5. Data management +6. Analytics +7. (optional) Use of S3 Objct Lambda: used to modify and process data during data extraction using Lambda function + +A single bucket can contain any type of data. No need to create separate buckets for each storage class + +Different storage classes: +* Frequent access + * **S3 Standard**: default storage class, detailed attributes are in table below +* Infrequent access + * **S3 Standard-Infrequent Access (S3 Standard-IA)** + * **S3 One Zone-Infrequent Access (S3 One Zone-IA)** +* Archive storage + * **Amazon Glacier** + * **Amazon Glacier Deep Archive** +* Unpredictable access patterns + * **Intelligent-Tiering**: another storage class offered by AWS. Object are automatically transitioned across 4 different tiers +* S3 on Outposts + +Figure below show S3 storage performance + +![S3 storage class performance](imgs/Table_5.2_a.jpg) + +Figure below show S3 storage class key attributes + +![S3 storage class key attributes](imgs/Table_5.2_b.jpg) + +### Versions + +**S3 Versioning** +* Feature offered by S3 service +* Disabled as default when the bucket is created +* Once enabled + * uploaded will be tagged with a new version ID, if there is existing object key (i.e. name) + * Delete an object simply add a delete marker to the object, and hide from S3 console. Hence, object can be recovered by delete the delete marker + +3 state of bucket in terms of versioning: +* Unversioned (default) +* Versioning-enabled +* Versioning-suspended + +Once enable versioning, you cannot return to Unversioned state, but you can suspend versioning + +### Cross-Region and same-Region replication + +By default, AWS never copy your data out of your Region, but AWS offer provide service to do so: +* **Amazon Cross-Region Replication (CRR)**: + * copy object acorss AWS buckets in different Regions *async* + * Use cases: + * **Reduce latency** + * **Increase operation efficiency**: app in different Region need access to same dataset + * **Meet regulatory and compliance requirement** +* **Same-Region Replication (SRR)**: + * provide replication service btw buckets in same Regions + * Use cases: + * **Log Data Aggregation**: log data can be saved in a single management bucket via replication + * **Replicating between development and production accounts**: Move objects from development AWS account to production AWS account. + * **Compliance requirements**: required to maintain multiple copies of data to adhere to data-residency laws. + +Note: versioning must be enabled to set up CRR or SRR + +**S3 Life cycle Management**: +* solution provided to manage vast quantities of data, when there is need to manage life-cycle of objects (upload -> maintain -> destroy) +* S3 Life cycle Management *actions* can be attached to buckets or subset of data +* 2 category of actions: + * **Transition actions**: move objects from one storage class (e.g. S3 -> Glacier) to another after a certain period of time + * **Expiration actions**: delete objects from S3 storage system after a set number of days. (e.g. delete after 1 year) + +### S3 encryption + +* All data uploaded to Amazon S3 is encrypted *in transit* using HTTPS protocol. But after transition, stored data are unecrypted by default. +* **Encryption at rest**: encrypt data before storing in S3. + +AWS offers 3 options for encrypting data at rest +1. **Server-side encryption**: When you upload (create) an object, S3 encrypts it before saving it to disk. When you download it, it's decrypted by S3 service. S3 have 3 options + 1. **Server-side encryption** with **Amazon S3-managed keys (SSE-S3)** - Amazon encrypts your data with a 256-bit **Advanced Encryption Standard (AES-256)** + 2. **Server-side encryption** with **customer master key (CMKs)** stored in AWS **Key Management Service (SSE-KMS)** + 3. **Server-side encryption** with **customer provided key (CPKs) (SSE-C)** stored in AWS +2. **Client-side encryption** + +### Static website hosting + +### Amazon S3TA + +* Problem to solve: host S3 bucket in a specific Region, but require users across the globe to upload objects to it. +* Solution: S3TA routes your uploads via Amazon CloudFront's globally distributed edge locations and AWS backbone networks. + +[S3TA speed checker website](https://s3-accelerate-speedtest.s3-accelerate.amazonaws.com/en/accelerate-speed-comparsion.html) + +## Learning about archiving solutions with Amazon S3 Glacier \ No newline at end of file diff --git a/chap5_qa.md b/chap5_qa.md new file mode 100644 index 0000000..035b734 --- /dev/null +++ b/chap5_qa.md @@ -0,0 +1,102 @@ +# Questions + +## Q1 + +Which of the following is true regarding Amazon S3? (Select 2 answers) + +1. Amazon S3 is object-based storage. +2. Amazon S3 is an example of file storage. +3. Amazon S3 is an example of block storage. +4. The Amazon S3 One Zone-IA storage class offers 99.5% of availability. Amazon S3 can be configured as shared mount volumes for Linux-based EC2 instances. + +ANS: 1 & 2 (Correct) + +## Q2 + +You wish to enforce a policy on an S3 bucket that grants anonymous access to its content if users connect to the data from the corporate and branch offices as part of your security strategy. Which S3 configuration feature will enable you to define the IP ranges from where you will allow access to the data? + +1. Security groups +2. Bucket policy +3. NTFS permissions +4. Network ACLs (NACLs) + +ANS: 2 (Correct) + +## Q3 + +Which AWS service is the most cost-effective if you need to host static website content for an upcoming product launch? + +1. Amazon EC2 +2. Amazon EFS +3. Amazon S3 +4. Azure ExpressRoute + +ANS: 3 (Correct) + +## Q4 + +Which Amazon S3 storage class enables you to optimize costs by automatically moving data to the most cost-effective access tier, while ensuring that frequently accessed data is made available immediately? + +1. Amazon S3 Standard +2. Amazon S3 One-Zone IA +3. Amazon Snowball +4. Amazon S3 Intelligent-Tiering + +ANS: 4 (Correct) + +## Q5 + +Which Amazon S3 service can be configured to automatically migrate data from one storage class to another after a set number of days as a means of reducing your costs, especially where frequent instant access may not be required to that subset of data? + +1. Static website hosting +2. Lifecycle management +3. Storage transition +4. S3 migration + +ANS: 2 (Correct) + +## Q6 + +When retrieving data from Amazon Glacier, what is the typical time taken by a Standard retrieval option to make the archive available for download? + +1. 20 minutes +2. 24 hours +3. 3 to 5 hours +4. 90 seconds + +ANS: 2 + +Incorrect: should be 3 + +## Q7 + +Which feature of the Amazon S3 platform enables you to upload content to a centralized bucket from across any location via Amazon edge locations, ensuring faster transfer speeds and avoidance of public internet congestion? + +1. Amazon S3TA +2. AWS S3 Storage Gateway +3. Amazon VPC +4. CloudFront + +ANS: 1 (Correct) + +## Q8 + +Your on-premises applications require access to a centrally managed cloud storage service. The application running on your servers need to be able to store and retrieve files as durable objects on Amazon S3 over standard NFS-based access with local caching. Which AWS service can help you deliver a solution to meet the aforementioned requirements? + +1. AWS Storage Gateway— Amazon S3File Gateway +2. AWS EFS +3. Amazon Redshift +4. EBS volumes + +ANS: 1 (Correct) + +## Q9 + +You are looking to migrate your on-premises data to the cloud. As part of a one-time data migration effort, you need to transfer over 900 TB of data to Amazon S3 in a couple of weeks. Which is the most cost-effective strategy to transfer this amount of data to the cloud? + +1. Use the Amazon RDS service +2. Use the Amazon Snowball service +3. Use the Amazon VPN connection between your on-premises network and AWS +4. Use AWS Rain + +ANS: 2 (Correct) \ No newline at end of file diff --git a/imgs/B17124_05_01.jpg b/imgs/B17124_05_01.jpg new file mode 100644 index 0000000..f4a0d43 Binary files /dev/null and b/imgs/B17124_05_01.jpg differ diff --git a/imgs/B17124_05_02.jpg b/imgs/B17124_05_02.jpg new file mode 100644 index 0000000..ce523ea Binary files /dev/null and b/imgs/B17124_05_02.jpg differ diff --git a/imgs/B17124_05_03.jpg b/imgs/B17124_05_03.jpg new file mode 100644 index 0000000..90fc9bd Binary files /dev/null and b/imgs/B17124_05_03.jpg differ diff --git a/imgs/Table_5.1.jpg b/imgs/Table_5.1.jpg new file mode 100644 index 0000000..21c3547 Binary files /dev/null and b/imgs/Table_5.1.jpg differ diff --git a/imgs/Table_5.2_a.jpg b/imgs/Table_5.2_a.jpg new file mode 100644 index 0000000..5d8176c Binary files /dev/null and b/imgs/Table_5.2_a.jpg differ diff --git a/imgs/Table_5.2_b.jpg b/imgs/Table_5.2_b.jpg new file mode 100644 index 0000000..403d0a3 Binary files /dev/null and b/imgs/Table_5.2_b.jpg differ