AWS BigData Specialty Certification: Preparation, Materials and the Exam

Why AWS, and not other cloud providers certification?

Based on the regularly reported statistics [1] Gartner’s report, AWS is staying as the leader in the global cloud competition. The number of clients that are with AWS now also making a good projection to choose exactly this certification. It would be wrong to say that all these exam topics are only AWS-specific and could not be applied with on-premise solutions or other cloud providers’ infrastructure. There are some services dedicated to AWS only, but the overall approach, principles and fundamentals with majority of components are cloud-agnostic. So switching from one cloud provider’s stack to another should not be an issue and this knowledge is fully reusable. Preparing and passing the Certification are absolute win-win in this case.

It is BigData exam and the amount of topics is BIG

Exam’s content is under NDA so I will not explicitly explain and retell questions here, but I want to highlight the topics and the general approach used on the exam. The area of topics is the same as described in the AWS exam guide. It covers all the areas from Streaming, Processing, Analytics, Security, Visualization, IoT, Machine Learning. You can check the distribution in the percentages of each topic in the exam on the guide page.

You should know Kinesis, Spark streaming, DynamoDB with streaming and DAX caching, and its integration. Know how to calculate Capacity Units for Eventually Consistent or Strongly Consistent Reads of DynamoDB, limitations of shards on the Kinesis Stream.

Hadoop and MapReduce were present for decades and you still can meet them in the technical stack of lots of companies, so having this knowledge is desired. In AWS infrastructure it is represented by EMR service that has integration with both HDFS and S3 storage system (EMRFS instead of classic HDFS volumes). S3 topic is also widely covered on the exam, so review storage types, lifecycle rules, Glacier Vaults, security. RDS, Elastic Cache, Redshift DWH data ingestion and partitioning, Glue, Lambda integration and troubleshooting issues, Hive, Kinesis Analytics are met too.

Check QuickSight dashboards integration and what graphs types are good enough for particular business use cases. IoT infrastructure and its integration with other services. Security per service (what encryption in rest and in transit are applied), integrations in AWS VPC, IAM with its limitation, integration with IdP and Federations, HSM, RBAC for DWH based on opensource tools like Apache Ranger.

And this is not a final list of course…​

The questions are mostly situational

Be ready to see a lot of text in questions. Some are in 4-5 sentences long with values and metrics present. Question describes some situation or business case to solve and identifies several answers for verification. Such situational questions are more interesting than obvious YES/NO. The certification center provides you markers and papers for notes. So, while reading the content and understanding the current architecture described in the question it is good to draw diagrams. Also, I was noting some keywords that could narrow the choice of components, f.e. realtime is required or data delay in 1 month is ok or need historical data, etc. Answers are multichoice and you need to identify between them the most appropriate solution.

There are questions where all answers can be applied as possible, for such situations it is important to understand the root of the question - the challenge that is defined to solve. You will have markers like define solution with LEAST time to implement or that covers all defined aspects, etc.

Another interesting aspect of the exam is that it not only tests your knowledge of each service in isolation but verifies your understanding of interaction in the chain of components. And it is more close to real life.

Knowledge and hands-on experience with open-sourced stack is a bonus in preparation

It is not a secret that all public cloud providers are using different distributions of Big Data Stack underneath, like Hortonworks, MapR, Cloudera. AWS f.e. uses Amazon and MapR distribution for EMR, Athena engine is based on Presto, etc. So if you had a chance to work with some of these products in the past - it is an extra point for you.

But some non-agnostic and aws-specific services and tools are in use on AWS BigData, so knowledge of open-source tools only is not enough. And of course, you should understand the infrastructure with security implementation and each component’s place in the global AWS landscape.

The previous certifications could compliment your knowledge

AWS-specific knowledge from previous exams especially for components like S3, Kinesis, SQS, DynamoDB, Lambda, VPC Endpoint, CloudWatch, SecurityGroups will be a good bridge in moving to learn and understand deeper BigData stack.

But be ready that this Specialty exam has much deeper questions and the area of material is wider. FYI: my preparation notes from the previous Architect, Developer and SysOps Exams were written in the single notebook and there is even extra free space for notes. For BigData Exam I have the same size second notebook that is fully filled with all notes and diagrams.

Materials and resources for preparation

As for me, Frank Kane’s BigData course that is based on the Hortonworks Distribution can be a good pre-requisite before going deeper into AWS-specific BigData stack. Very popular training resources with AWS related content are CloudGuru, Udemi and Oreilly I enjoyed the way how Stephane Maarek and Frank Kane deliver the materials. I’m also a big fan of visual associations that are developed in our brain, so good diagrams, visualization as for me is better that just a text with voice in background.

Unfortunately, at this moment there is no AWS Practical Test for Big Data Specialty - where you can try and get the feeling of questions and timing (there are lots of practice tests for other AWS Exams). So there is no clear landmark to align if you are prepared enough or not. Just be ready that questions on the exam will be harder and deeper than in CloudGuru or Oreilly exam courses.

AWS has excellent open library with blueprints and white-papers (link is in «Resources» section below). It is good to read these resources not only in terms of exam preparation, but for regular practices. Hands-on activity is the glue that could better structure your knowledge and build deeper understanding on the mechanical reproducible memory levels.

Extra +30 mins: tip for non-native speakers

The exam is scheduled in several languages and if you are not a native speaker you can ask for an extra 30 min to be added to your exam timeline. To do this, login into your account of Amazon Training and Certification Exam Accommodations and request ESL +30 MINUTES. Once it is approved - you will have an extra 30 min for each scheduled exam in the future. So first create and pass through approvement of ESL + 30 MINUTES request and then schedule the exam. With this option enabled you will have 200 min for AWS BigData Specialty Exam.

Based on my personal experience and feedback from other persons exam time is very tight to answer all questions. So having an extra 30min time is good to review your answers or think over the hard questions.


Hope, that this review and tips will be useful for those of you who are on their way to certification preparation. It will be interesting to hear your unique story of preparation and passing exam as well. So feel free to add them in comments.

Good Luck with your Certification !!!


  1. Gartner report
  2. AWS guide BigData exam
  3. AWS sample questions
  4. AWS White Paper on Big Data Analytics
  5. AWS White Paper Kinesis
  6. EMR security
  7. EMR best practices
  8. AWS well architected framework
  9. Redshift encryption
  10. AWS Machine Learining
This post is licensed under CC BY 4.0 by the author.