Our Firm



Data Engineer


Regulatory DataCorp | Information Technology

Job Summary

RDC is looking for a Data Engineer to be part of a team dedicated to building best-in-class machine learning solutions that protect the world’s financial systems. As part of the Architecture team, the Data Engineer will work on the management side of data to make it easy for other systems and people (e.g.,  Data Science, Development, and Product) to use the data to develop and enhance stable and scalable software solutions.

Essentials Duties and Responsibilities

·       Architect/Design, implement, monitor, and maintain big data pipelines and ETL/ELT pipelines

·       Gather data requirements, capture and maintain technical/operational/business metadata

·       Source data from different systems

·       Store data using the optimal technology (e.g. SQL, NoSQL, HDFS, S3) for the particular use

·       Prepare data for analysis by performing data wrangling/munging

·       Cleanse data

·       Convert data from one format to another

·       De-duplicate data

·       Discover opportunities for data acquisition and pick the right tools to collect and analyze such datasets in batch and/or real-time

·       Recommend and implement methods to improve data governance, security, reliability, efficiency and quality

·       Implement best practices around data modeling, data partitioning and data backfilling on new and existing data

·       Help the team ensure compliance with all regulatory requirements related to data privacy

·       Work closely with the Architecture, Data Scientist, and Tech-Ops teams to ensure efficient and effective delivery of data solutions

·       Interface with Software Engineers, Product Managers and Business Analysts to understand goals, data needs and implement data-driven features/products

Skills & Requirements


·       Bachelor’s degree in Computer Science or related field

·       At least 3 years of relevant data engineering experience building, testing and maintaining data a data architecture

·       Strong background in software engineering with 3+ years of experience in software development writing production code, in POSIX shell, bash (or similar), plus at least one other dynamic language (e.g., Perl, Ruby, Python). Java experience desirable but not required.

·       Knowledge of data encoding (ASCII, UTF-8, UTF-16, UCS-2, etc.)

·       2-3 years professional experience with:

o   Unix shell scripting and tool building

o   ETL/ELT tools and approaches

o   SQL (MSSQL, PostgreSQL) and NoSQL databases (MongoDB)

o   Various data serialization formats such as Apache Avro, Apache Parquet, json, csv, yaml, xml

o   Elasticsearch/Kibana or a similar distributed search and analytics engine

·       Desirable but not essential:

o   Amazon Web Services (EMR, S3, Glue, IAM, ECS)

o   Kafka, Spark Streaming or a similar real-time stream processing framework

o   Spark ecosystem (Dataframes, MLlib, SparkSQL) & Hadoop ecosystem (HDFS, Hive)

o   ActiveMQ, RabbitMQ or a similar messaging system

o   Databricks/AWS or a similar web-based platform for working with Spark and other Big Data tools

o   Apache Oozie, Apache Airflow, Luigi or a similar workflow management system

·       Demonstrated proficiency with:

o   Unix/Linux OS

o   Database Management Systems

o   Distributed Systems

o   Big Data concepts/tools

·       Curious, self-driven, analytical

·       Demonstrated ability to work with ambiguous requirements, adapt, and learn

·       Excellent verbal and written communication