Amazon SageMaker Floor Fact considerably reduces the fee and time required for labeling information by integrating human annotators with machine studying to automate the labeling course of. You should use SageMaker Floor Fact to create labeling jobs, that are workflows the place information objects (comparable to photos, movies, or paperwork) should be annotated by human staff. These labeling jobs are distributed amongst a workteam—a body of workers assigned to carry out the annotations. To entry the info objects they should label, staff are supplied with Amazon S3 presigned URLs.
A presigned URL is a short lived URL that grants time-limited entry to an Amazon Easy Storage Service (Amazon S3) object. Within the context of SageMaker Floor Fact, these presigned URLs are generated utilizing the grant_read_access Liquid filter and embedded into the duty templates. Staff can then use these URLs to instantly entry the required information, comparable to photos or paperwork, of their internet browsers for annotation functions.
Whereas presigned URLs supply a handy option to grant short-term entry to S3 objects, sharing these URLs with individuals outdoors of the workteam can result in unintended entry of these objects. To mitigate this threat and improve the safety of SageMaker Floor Fact labeling duties, we now have launched a brand new characteristic that provides an extra layer of safety by proscribing entry to the presigned URLs to the employee’s IP handle or digital personal cloud (VPC) endpoint from which they entry the labeling job. On this weblog submit, we present you methods to allow this characteristic, permitting you to boost your information safety as wanted, and description the success standards for this characteristic, together with the situations the place it is going to be most useful.
Stipulations
Earlier than you get began configuring IP-restricted presigned URLs, the next sources will help you perceive the background ideas:
- Amazon S3 presigned URL: This documentation covers using Amazon S3 presigned URLs, which offer short-term entry to things. Understanding how presigned URLs work can be useful.
- Use Amazon SageMaker Floor Fact to label information: This information explains methods to use SageMaker Floor Fact for information labeling duties, together with organising workteams and workforces. Familiarity with these ideas can be useful when configuring IP restrictions on your workteams.
Introducing IP-restricted presigned URLs
Working carefully with our clients, we acknowledged the necessity for enhanced safety posture and stricter entry controls to presigned URLs. So, we launched a brand new characteristic that makes use of AWS world situation context keys aws:SourceIp and aws:VpcSourceIp to permit clients to limit presigned URL entry to particular IP addresses or VPC endpoints. By incorporating AWS Identification and Entry Administration (IAM) coverage constraints, now you can prohibit presigned URLs to solely be accessible from an IP handle or VPC endpoint of your selection. This IP-based entry management successfully locks down the presigned URL to the employee’s location, mitigating the danger of unauthorized entry or unintended sharing.
Advantages of the brand new characteristic
This replace brings a number of important safety advantages to SageMaker Floor Fact:
- Enhanced information privateness: These IP restrictions prohibit presigned URLs to solely be accessible from customer-approved places, comparable to company VPNs, staff’ dwelling networks, or designated VPC endpoints. Though the presigned URLs are pre-authenticated, this characteristic provides an extra layer of safety by verifying the entry location and locking the URL to that location till the duty is accomplished.
- Lowered threat of unauthorized entry: Implementing IP-based entry controls minimizes the danger of information being accessed from unauthorized places and mitigates the danger of information sharing outdoors the employee’s accredited entry community. That is notably essential when coping with delicate or confidential information.
- Versatile safety choices: You’ll be able to apply these restrictions in both VPC or non-VPC settings, permitting you to tailor safety measures to your group’s particular wants.
- Auditing and compliance: By locking down presigned URLs to particular IP addresses or VPC endpoints, you possibly can extra simply observe and audit entry to your group’s information, serving to obtain compliance with inner insurance policies and exterior laws.
- Seamless integration: This new characteristic seamlessly integrates with present SageMaker Floor Fact workflows, offering enhanced safety with out disrupting established labeling processes or requiring important adjustments to present infrastructure.
By introducing IP-Restricted presigned URLs, SageMaker Floor Fact empowers you with larger management over information entry, so delicate info stays accessible solely to approved staff inside accredited places.
Configuring IP-restricted presigned URLs for SageMaker Floor Fact
The brand new IP restriction characteristic for presigned URLs in SageMaker Floor Fact will be enabled by way of the SageMaker API or the AWS Command Line Interface (AWS CLI). Earlier than we go into the configuration of this new characteristic, let’s have a look at how one can create and replace workteams at this time utilizing the AWS CLI. It’s also possible to carry out these operations by way of the SageMaker API utilizing the AWS SDK.
Right here’s an instance of making a brand new workteam utilizing the create-workteam command:
aws sagemaker create-workteam
--description "A crew for picture labeling duties"
--workforce-name "default"
--workteam-name "MyWorkteam"
--member-definitions '{
"CognitoMemberDefinition": {
"ClientId": "exampleclientid",
"UserGroup": "sagemaker-groundtruth-user-group",
"UserPool": "us-west-2_examplepool"
}
}'
To replace an present workteam, you employ the update-workteam command:
aws sagemaker update-workteam
--workteam-name "MyWorkteam"
--description "Up to date description for picture labeling duties"
Observe that these examples solely present a subset of the out there parameters for the create-workteam
and update-workteam
APIs. You’ll find detailed documentation and examples within the SageMaker Floor Fact Developer Information.
Enabling IP restrictions for presigned URLs
With the brand new IP restriction characteristic, now you can configure IP-based entry constraints particular to every workteam when creating a brand new workteam or modifying an present one. Right here’s how one can allow these restrictions:
- When creating or updating a workteam, you possibly can specify a WorkerAccessConfiguration object, which defines entry constraints for the employees in that workteam.
- Throughout the
WorkerAccessConfiguration
, you possibly can embrace an S3Presign object, which lets you set entry configurations for the presigned URLs utilized by the employees. Presently, solelyIamPolicyConstraints
will be added to the S3Presign SageMaker Floor Fact offers two Liquid filters that you should use in your customized employee job templates to generate presigned URLs:grant_read_access
: This filter generates a presigned URL for the required S3 object, granting short-term learn entry. The command will seem like:s3_presign
: This new filter serves the identical goal asgrant_read_access
however makes it clear that the generated URL is topic to the S3Presign configuration outlined for the workteam. The command will seem like:
- The S3Presign object helps
IamPolicyConstraints
, the place you possibly can allow or disable theSourceIp
andVpcSourceIp
SourceIp
: When enabled, staff can entry presigned URLs solely from the required IP addresses or ranges.VpcSourceIp
: When enabled, staff can entry presigned URLs solely from the required VPC endpoints inside your AWS account.
You’ll be able to name the SageMaker ListWorkteams or DescribeWorkteam APIs to view workteams’ metadata, together with the WorkerAccessConfiguration
.
Let’s say you wish to create or replace a workteam in order that presigned URLs can be restricted to the general public IP handle of the employee who initially accessed it.
Create workteam:
aws sagemaker create-workteam
--description "An instance workteam with S3 presigned URLs restricted"
--workforce-name "default"
--workteam-name "exampleworkteam"
--member-definitions '{
"CognitoMemberDefinition": {
"ClientId": "exampleclientid",
"UserGroup": "sagemaker-groundtruth-user-group",
"UserPool": "us-west-2_examplepool"
}
}'
--worker-access-configuration '{
"S3Presign": {
"IamPolicyConstraints": {
"SourceIp": "Enabled",
"VpcSourceIp": "Disabled"
}
}
}'
Replace workteam:
aws sagemaker update-workteam
--workteam-name "existingworkteam"
--worker-access-configuration '{
"S3Presign": {
"IamPolicyConstraints": {
"SourceIp": "Enabled",
"VpcSourceIp": "Disabled"
}
}
}'
Success standards
Whereas the IP-restricted presigned URLs characteristic offers enhanced safety, there are situations the place it won’t be appropriate. Understanding these limitations will help you make an knowledgeable resolution about utilizing the characteristic and confirm that it aligns along with your group’s safety wants and community configurations.
IP-restricted presigned URLs are efficient in situations the place there’s a constant IP handle utilized by the employee accessing SageMaker Floor Fact and the S3 object. For instance, if a employee accesses labeling duties from a steady public IP handle, comparable to an workplace community with a set IP handle, the IP restriction will present entry with enhanced safety. Equally, when a employee accesses each SageMaker Floor Fact and S3 objects by way of the identical VPC endpoint, the IP restriction will confirm that the presigned URL is simply accessible from inside this VPC. In each situations, the constant IP handle permits the IP-based entry controls to operate accurately, offering an extra layer of safety.
Eventualities the place IP-restricted presigned URLs aren’t efficient
State of affairs | Description | Instance | Exit standards |
Uneven VPC endpoints | SageMaker Floor Fact is accessed by way of a public web connection whereas Amazon S3 is accessed by way of a VPC endpoint, or vice versa. | Employee accesses SageMaker Floor Fact by way of the general public web however S3 by way of a VPC endpoint. | Confirm that each SageMaker Floor Fact and S3 are accessed both fully by way of the general public web or fully by way of the identical VPC endpoint. |
Community Deal with Translation (NAT) layers | NAT layers can alter the supply IP handle of requests, inflicting IP mismatches. Points can come up from dynamically assigned IP addresses or uneven configurations. | Examples embrace:
|
Confirm that the NAT gateway is configured to protect the supply IP handle. Validate the NAT configuration for consistency when accessing each SageMaker Floor Fact and S3 sources. |
Use of VPNs | VPNs change the outgoing IP handle, resulting in potential entry points with IP-restricted presigned URLs. | Employee makes use of a split-tunnel VPN that adjustments IP handle for various requests to Floor Fact or S3, entry is likely to be denied. | Disable the VPN or use a full tunnel VPN that gives constant IP handle for all requests. |
Interface endpoints aren’t supported by the grant_read_access
characteristic due to their lack of ability to resolve public DNS names. This limitation is orthogonal to the IP restrictions and ought to be thought-about when configuring your community setup for accessing S3 objects with presigned URLs. In such instances, use the S3 Gateway endpoint when accessing S3 to confirm compatibility with the general public DNS names generated by grant_read_access
.
Utilizing S3 entry logs for debugging
To debug points associated to IP-restricted presigned URLs, S3 entry logs can present useful insights. By enabling entry logging on your S3 bucket, you possibly can observe each request made to your S3 objects, together with the IP addresses from which the requests originate. This will help you determine:
- Mismatches between anticipated and precise IP addresses
- Dynamic IP addresses or VPNs inflicting entry points
- Unauthorized entry from surprising places
To debug utilizing S3 entry logs, comply with these steps:
- Allow S3 entry logging: Configure your bucket to ship entry logs to a different bucket or a logging service comparable to Amazon CloudWatch Logs.
- Overview log information: Analyze the log information to determine patterns or anomalies in IP addresses, request timestamps, and error codes.
- Search for IP handle adjustments: If you happen to observe frequent adjustments in IP addresses inside the logs, it’d point out that the employee’s IP handle is dynamic or altered by a VPN or proxy.
- Examine for NAT layer modifications: See if NAT layers are modifying the supply IP handle by checking the x-forwarded-for header within the log information.
- Confirm approved entry: Verify that requests are coming from accredited and constant IP addresses by checking the Distant IP area within the log information.
By following these steps and analyzing the S3 entry logs, you possibly can validate that the presigned URLs are accessed solely from accredited and constant IP addresses.
Conclusion
The introduction of IP-restricted presigned URLs in Amazon SageMaker Floor Fact considerably enhances the safety of information accessed by way of the service. By permitting you to limit entry to particular IP addresses or VPC endpoints, this characteristic helps facilitate extra fine-tuned management of presigned URLs. It offers organizations with added safety for his or her delicate information, providing a useful choice for these with stringent safety necessities. We encourage you to discover this new safety characteristic to guard your group’s information and improve the general safety of your labeling workflows. To get began with SageMaker Floor Fact, go to Getting Began. To implement IP restrictions on presigned URLs as a part of your workteam setup, check with the CreateWorkteam and UpdateWorkteam API documentation. Observe the steering supplied on this weblog to configure these safety measures successfully. For extra info or help, contact your AWS account crew or go to the SageMaker neighborhood boards.
Concerning the Authors
Sundar Raghavan is an AI/ML Specialist Options Architect at AWS, serving to clients construct scalable and cost-efficient AI/ML pipelines with Human within the Loop providers. In his free time, Sundar loves touring, sports activities and having fun with outside actions together with his household.
Michael Borde is a lead software program engineer at Amazon AI, the place he has been for seven years. He beforehand studied arithmetic and pc science on the College of Chicago. Michael is captivated with cloud computing, distributed programs design, and digital privateness & safety. After work, you possibly can usually discover Michael putzing across the native powerlifting gymnasium in Capitol Hill.
Jacky Shum is a Software program Engineer at AWS within the SageMaker Floor Fact crew. He works to assist AWS clients leverage machine studying functions, together with prior work on ML-based fraud detection with Amazon Fraud Detector.
Rohith Kodukula is a Software program Improvement Engineer on the SageMaker Floor Fact crew. In his free time he enjoys staying lively and studying up on something that he finds mildly fascinating (most issues actually).
Abhinay Sandeboina is a Engineering Supervisor at AWS Human In The Loop (HIL). He has been in AWS for over 2 years and his groups are accountable for managing ML platform providers. He has a decade of expertise in software program/ML engineering constructing infrastructure platforms at scale. Previous to AWS, he labored in varied engineering administration roles at Zillow and Capital One.