Data Analyst

Level:

Mid – Senior

We are looking for Data Analysts who will manage the full lifecycle analysis of requirements, activities, design, and reporting capabilities. They will also monitor performance and quality control plans to identify improvements.

Main duties and responsibilities:

    ● Handle tools for data transformation and/or ETL through tool-based and code-based development – ex. Informatica or Talend for tool based / Spark or MapReduce framework in Python or Scala/Java for code-based development
    ● Design and maintain data systems and databases; this includes the fixing of coding errors and other data-related problems
    ● Prepare reports for executive leadership that effectively communicate trends, patterns, and predictions using relevant data
    ● Collaborate with programmers, engineers, and organizational leaders to identify opportunities for process improvements, recommend system modifications, and develop policies for data governance
    ● Create appropriate documentation that allows stakeholders to understand the steps of the data analysis process and duplicate or replicate the analysis if necessary
    ● Understand and develop ANSI SQL queries

Qualifications:

    ● Bachelor’s degree in Computer Science/IT/Computing or equivalent
    ● Well-versed in SQL
    ● Experience in Flink Streaming framework
    ● Experience in using Cloud Technologies (AWS RedShift, S3, etc.)
    ● Experience in any of the following languages Python, Java, Scala
    ● Experience in Talend, Information, Ab Initio, and other ETL GUI based tools
    ● Experience in end-to-end DevOps techniques and Git version control is an advantage
    ● Experience in Docker containers and container technologies is an advantage
    ● Must understand the concepts of Small Files in HDFS, Storage formats such as columnar format (Parquet and ORC), and Transmit formats (Avro)
    ● Must be able to understand resource allocation and proper resource sizing for distributed jobs (ex. Spark executor and driver sizes)
    ● Comfortable in working with Hive partitioning and bucketing