ADD base domain information on Index page and API endpoint

This commit is contained in:
CaffeineFueled 2025-04-09 19:54:24 +02:00
parent fc72f6f51c
commit 7db919bcb7
3 changed files with 270 additions and 27 deletions

View file

@ -85,11 +85,39 @@ Where columns are:
4. Record Type (A, AAAA, MX, CNAME, TXT, etc.) 4. Record Type (A, AAAA, MX, CNAME, TXT, etc.)
5. Record Data (IP address, hostname, or other data depending on record type) 5. Record Data (IP address, hostname, or other data depending on record type)
## Domain Base Name Detection
The application includes functionality to identify base domains from fully qualified domain names, including handling of multi-part TLDs like ".co.uk" or ".com.au".
### Multi-Part TLD List
The application uses a hardcoded list of common multi-part TLDs to correctly extract base domains (e.g., "example.co.uk" from "mail.example.co.uk").
This list can be found in `main.py` as `MULTI_PART_TLDS`.
### Updating the TLD List
To ensure accurate domain parsing, you should periodically update the multi-part TLD list. The best sources for this information are:
1. **Public Suffix List (PSL)**: The most comprehensive and authoritative source
- Website: https://publicsuffix.org/list/
- GitHub: https://github.com/publicsuffix/list
- This list is maintained by Mozilla and used by browsers and DNS applications
2. **IANA's TLD Database**: The official registry of top-level domains
- Website: https://www.iana.org/domains/root/db
3. **Commercial Domain Registrars**: Often provide lists of available TLDs
- Examples: GoDaddy, Namecheap, etc.
For the most accurate and comprehensive implementation, consider implementing a parser for the Public Suffix List or using a library that maintains this list (e.g., `publicsuffix2` for Python).
## API Endpoints ## API Endpoints
- `/api/uploads` - Get all uploads - `/api/uploads` - Get all uploads
- `/api/slds` - Get all SLDs (Second Level Domains) - `/api/domains` - Get all domains
- `/api/slds/{sld}` - Get domains by SLD - `/api/base-domains` - Get only unique base domains (e.g., example.com, example.co.uk) with simplified response format
- `/api/domains/{domain}` - Get domains by name
- `/api/dns` - Get all DNS records - `/api/dns` - Get all DNS records
- `/api/dns/types` - Get unique values for filters - `/api/dns/types` - Get unique values for filters
@ -100,8 +128,27 @@ You can filter the API results using the following query parameters:
- `upload_id` - Filter by specific upload - `upload_id` - Filter by specific upload
- `record_type` - Filter by DNS record type - `record_type` - Filter by DNS record type
- `record_class` - Filter by DNS record class - `record_class` - Filter by DNS record class
- `tld` - Filter by Top Level Domain
- `sld` - Filter by Second Level Domain
- `domain` - Search by domain name - `domain` - Search by domain name
- `base_domains_only` - Only show base domains (e.g., example.com not mail.example.com)
- `deduplicate` - For DNS records, control whether to show all records or deduplicate
Example: `/api/dns?record_type=A&tld=com&upload_id=upload_20250408120000` Examples:
- `/api/domains?base_domains_only=true` - Show only base domains
- `/api/base-domains` - Get a simplified list of unique base domains
- `/api/dns?record_type=A&domain=example.com&deduplicate=false` - Show all A records for example.com without deduplication
### Response Format Examples
1. Base Domains Endpoint (`/api/base-domains`):
```json
[
{
"domain": "example.com",
"timestamp": "2025-04-08T12:00:00"
},
{
"domain": "example.co.uk",
"timestamp": "2025-04-08T12:00:00"
}
]
```

140
main.py
View file

@ -139,23 +139,42 @@ async def process_csv_upload(file_content, upload_id, description=None):
print(traceback.format_exc()) print(traceback.format_exc())
return 0, 0 return 0, 0
# Load domains from database - deduplicated by full domain name # Load domains from database - deduplicated by full domain name, with optional base domain filtering
def load_domains(specific_upload_id: str = None) -> List[Dict]: def load_domains(specific_upload_id: str = None, base_domains_only: bool = False) -> List[Dict]:
try: try:
domains = domains_table.all() domains = domains_table.all()
# If a specific upload ID is provided, only show domains from that upload # If a specific upload ID is provided, only show domains from that upload
if specific_upload_id: if specific_upload_id:
domains = [d for d in domains if d.get('upload_id') == specific_upload_id] domains = [d for d in domains if d.get('upload_id') == specific_upload_id]
if not base_domains_only:
return domains return domains
# Add the base_domain field to each domain
for domain in domains:
domain['base_domain'] = extract_base_domain(domain.get('full_domain', ''))
# Sort by timestamp in descending order (newest first) # Sort by timestamp in descending order (newest first)
domains.sort(key=lambda x: x.get('timestamp', ''), reverse=True) domains.sort(key=lambda x: x.get('timestamp', ''), reverse=True)
# Create a dictionary to track unique domains by full domain name # Create a dictionary to track unique domains
unique_domains = {} unique_domains = {}
base_domains_set = set()
# First pass: collect all base domains
if base_domains_only:
for domain in domains:
base_domains_set.add(domain.get('base_domain', ''))
for domain in domains: for domain in domains:
# If base_domains_only is True, only keep domains that are base domains themselves
if base_domains_only:
full_domain = domain.get('full_domain', '')
base_domain = domain.get('base_domain', '')
if full_domain != base_domain:
continue
# Create a unique key based on the full domain name # Create a unique key based on the full domain name
unique_key = domain.get('full_domain', '') unique_key = domain.get('full_domain', '')
@ -206,6 +225,90 @@ def load_dns_entries(specific_upload_id: str = None, deduplicate: bool = False)
print(f"Error loading DNS records from database: {e}") print(f"Error loading DNS records from database: {e}")
return [] return []
# List of known multi-part TLDs
MULTI_PART_TLDS = [
'co.uk', 'org.uk', 'me.uk', 'ac.uk', 'gov.uk', 'net.uk', 'sch.uk',
'com.au', 'net.au', 'org.au', 'edu.au', 'gov.au', 'asn.au', 'id.au',
'co.nz', 'net.nz', 'org.nz', 'govt.nz', 'ac.nz', 'school.nz', 'geek.nz',
'com.sg', 'edu.sg', 'gov.sg', 'net.sg', 'org.sg', 'per.sg',
'co.za', 'org.za', 'web.za', 'net.za', 'gov.za', 'ac.za',
'com.br', 'net.br', 'org.br', 'gov.br', 'edu.br',
'co.jp', 'ac.jp', 'go.jp', 'or.jp', 'ne.jp', 'gr.jp',
'co.in', 'firm.in', 'net.in', 'org.in', 'gen.in', 'ind.in',
'edu.cn', 'gov.cn', 'net.cn', 'org.cn', 'com.cn', 'ac.cn',
'com.mx', 'net.mx', 'org.mx', 'edu.mx', 'gob.mx'
]
# Extract the base domain (SLD+TLD) from a full domain name
def extract_base_domain(domain: str) -> str:
if not domain:
return domain
# Remove trailing dot if present
if domain.endswith('.'):
domain = domain[:-1]
parts = domain.split('.')
# Check if the domain has enough parts
if len(parts) <= 1:
return domain
# Check for known multi-part TLDs first
for tld in MULTI_PART_TLDS:
tld_parts = tld.split('.')
if len(parts) > len(tld_parts) and '.'.join(parts[-len(tld_parts):]) == tld:
# The domain has a multi-part TLD, extract SLD + multi-part TLD
return parts[-len(tld_parts)-1] + '.' + tld
# Default case: extract last two parts
if len(parts) > 1:
return '.'.join(parts[-2:])
return domain
# Get all unique base domains from the database
def get_unique_base_domains(specific_upload_id: str = None) -> List[Dict]:
try:
domains = domains_table.all()
# If a specific upload ID is provided, only show domains from that upload
if specific_upload_id:
domains = [d for d in domains if d.get('upload_id') == specific_upload_id]
# Add the base_domain field to each domain
for domain in domains:
domain['base_domain'] = extract_base_domain(domain.get('full_domain', ''))
# Sort by timestamp in descending order (newest first)
domains.sort(key=lambda x: x.get('timestamp', ''), reverse=True)
# Create dictionaries to track unique base domains
unique_base_domains = {}
# Process each domain and keep only unique base domains
for domain in domains:
base_domain = domain.get('base_domain', '')
# Skip if no base domain
if not base_domain:
continue
# Check if this base domain has been seen before
if base_domain not in unique_base_domains:
# Create a new entry for this base domain - with simplified fields
base_domain_entry = {
'domain': base_domain,
'timestamp': domain.get('timestamp')
}
unique_base_domains[base_domain] = base_domain_entry
# Return the list of unique base domains
return list(unique_base_domains.values())
except Exception as e:
print(f"Error getting unique base domains: {e}")
return []
# Get unique values for filter dropdowns # Get unique values for filter dropdowns
def get_unique_values(entries: List[Dict]) -> Dict[str, Set]: def get_unique_values(entries: List[Dict]) -> Dict[str, Set]:
unique_values = { unique_values = {
@ -249,16 +352,21 @@ def delete_upload(upload_id):
# Routes # Routes
@app.get("/", response_class=HTMLResponse) @app.get("/", response_class=HTMLResponse)
async def home(request: Request, upload_id: Optional[str] = None): async def home(
"""Home page with upload form and SLD listing""" request: Request,
domains = load_domains(upload_id) upload_id: Optional[str] = None,
base_domains_only: Optional[bool] = False
):
"""Home page with upload form and domain listing"""
domains = load_domains(upload_id, base_domains_only)
uploads = get_uploads() uploads = get_uploads()
return templates.TemplateResponse( return templates.TemplateResponse(
"index.html", "index.html",
{ {
"request": request, "request": request,
"domains": domains, "domains": domains,
"uploads": uploads "uploads": uploads,
"base_domains_only": base_domains_only
} }
) )
@ -370,12 +478,22 @@ async def get_all_uploads():
return get_uploads() return get_uploads()
@app.get("/api/domains", response_model=List[Dict]) @app.get("/api/domains", response_model=List[Dict])
async def get_domains(upload_id: Optional[str] = None): async def get_domains(
"""API endpoint that returns all domains with optional filter by upload_id""" upload_id: Optional[str] = None,
# The load_domains function now handles deduplication and upload_id filtering base_domains_only: Optional[bool] = False
domains = load_domains(upload_id) ):
"""API endpoint that returns all domains with optional filtering"""
# The load_domains function handles deduplication and filtering
domains = load_domains(upload_id, base_domains_only)
return domains return domains
@app.get("/api/base-domains", response_model=List[Dict])
async def get_base_domains(upload_id: Optional[str] = None):
"""API endpoint that returns only unique base domains"""
# Get only the unique base domains
base_domains = get_unique_base_domains(upload_id)
return base_domains
@app.get("/api/domains/{domain}", response_model=List[Dict]) @app.get("/api/domains/{domain}", response_model=List[Dict])
async def get_domains_by_name(domain: str, upload_id: Optional[str] = None): async def get_domains_by_name(domain: str, upload_id: Optional[str] = None):
"""API endpoint that returns domains matching a specific domain name with optional filter by upload_id""" """API endpoint that returns domains matching a specific domain name with optional filter by upload_id"""

View file

@ -61,6 +61,23 @@
font-size: 0.9em; font-size: 0.9em;
color: #0f5132; color: #0f5132;
} }
.base-domain-badge {
display: inline-block;
padding: 3px 7px;
background-color: #cfe2ff;
border-radius: 4px;
font-size: 0.9em;
color: #0a58ca;
}
.same-domain-badge {
display: inline-block;
padding: 3px 7px;
background-color: #e9ecef;
border-radius: 4px;
font-size: 0.9em;
color: #6c757d;
font-style: italic;
}
.api-section { .api-section {
margin-top: 30px; margin-top: 30px;
padding: 15px; padding: 15px;
@ -127,12 +144,48 @@
} }
.filter-form { .filter-form {
margin-bottom: 20px; margin-bottom: 20px;
background-color: #f9f9f9;
padding: 15px;
border-radius: 5px;
}
.filter-row {
display: flex;
flex-wrap: wrap;
gap: 15px;
align-items: flex-end;
}
.filter-group {
display: flex;
flex-direction: column;
}
.filter-group label {
font-weight: bold;
margin-bottom: 5px;
font-size: 0.9em;
} }
.filter-select { .filter-select {
padding: 8px 12px; padding: 8px 12px;
border: 1px solid #ddd; border: 1px solid #ddd;
border-radius: 4px; border-radius: 4px;
margin-right: 10px; min-width: 150px;
}
.btn-sm {
padding: 8px 16px;
font-size: 0.9em;
}
.reset-button {
display: inline-block;
padding: 8px 16px;
background-color: #f44336;
color: white;
text-decoration: none;
border-radius: 4px;
font-weight: bold;
font-size: 0.9em;
}
.reset-button:hover {
background-color: #e53935;
color: white;
} }
</style> </style>
</head> </head>
@ -199,8 +252,10 @@
<div class="filter-form"> <div class="filter-form">
<h2>Domain List</h2> <h2>Domain List</h2>
<form id="filterForm" method="get"> <form id="filterForm" method="get">
<div class="filter-row">
<div class="filter-group">
<label for="upload_filter">Filter by upload:</label> <label for="upload_filter">Filter by upload:</label>
<select id="upload_filter" name="upload_id" class="filter-select" onchange="this.form.submit()"> <select id="upload_filter" name="upload_id" class="filter-select">
<option value="">All uploads</option> <option value="">All uploads</option>
{% for upload in uploads %} {% for upload in uploads %}
<option value="{{ upload.id }}" {% if request.query_params.get('upload_id') == upload.id %}selected{% endif %}> <option value="{{ upload.id }}" {% if request.query_params.get('upload_id') == upload.id %}selected{% endif %}>
@ -208,6 +263,21 @@
</option> </option>
{% endfor %} {% endfor %}
</select> </select>
</div>
<div class="filter-group">
<label for="base_domains_only">Show base domains only:</label>
<select id="base_domains_only" name="base_domains_only" class="filter-select">
<option value="false" {% if request.query_params.get('base_domains_only', 'false') == 'false' %}selected{% endif %}>No (Show All)</option>
<option value="true" {% if request.query_params.get('base_domains_only') == 'true' %}selected{% endif %}>Yes (example.com only)</option>
</select>
</div>
<div class="filter-buttons">
<button type="submit" class="btn btn-sm">Apply Filters</button>
<a href="/" class="reset-button">Reset</a>
</div>
</div>
</form> </form>
</div> </div>
@ -215,8 +285,10 @@
<h3>API Endpoints</h3> <h3>API Endpoints</h3>
<p>Get all uploads: <code>/api/uploads</code></p> <p>Get all uploads: <code>/api/uploads</code></p>
<p>Get all domains: <code>/api/domains</code></p> <p>Get all domains: <code>/api/domains</code></p>
<p>Get only base domains: <code>/api/base-domains</code> (simplified format: <code>{"domain": "example.com", "timestamp": "..."}</code>)</p>
<p>Get domains by name: <code>/api/domains/{domain}</code></p> <p>Get domains by name: <code>/api/domains/{domain}</code></p>
<p>Filter by upload: <code>/api/domains?upload_id={upload_id}</code></p> <p>Filter by upload: <code>/api/domains?upload_id={upload_id}</code></p>
<p>Show base domains only: <code>/api/domains?base_domains_only=true</code></p>
</div> </div>
{% if domains %} {% if domains %}
@ -225,6 +297,9 @@
<thead> <thead>
<tr> <tr>
<th>Domain</th> <th>Domain</th>
{% if not base_domains_only %}
<th>Base Domain</th>
{% endif %}
<th>Upload Date</th> <th>Upload Date</th>
</tr> </tr>
</thead> </thead>
@ -232,6 +307,9 @@
{% for item in domains %} {% for item in domains %}
<tr> <tr>
<td><span class="domain-badge">{{ item.full_domain }}</span></td> <td><span class="domain-badge">{{ item.full_domain }}</span></td>
{% if not base_domains_only %}
<td>{% if item.base_domain != item.full_domain %}<span class="base-domain-badge">{{ item.base_domain }}</span>{% else %}<span class="same-domain-badge">Same as domain</span>{% endif %}</td>
{% endif %}
<td>{{ item.timestamp.replace('T', ' ').split('.')[0] if item.get('timestamp') else 'N/A' }}</td> <td>{{ item.timestamp.replace('T', ' ').split('.')[0] if item.get('timestamp') else 'N/A' }}</td>
</tr> </tr>
{% endfor %} {% endfor %}